{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 13 Measuring Text Similarities\n",
    "# 13. 1 Simple Text Comparison\n",
    "\n",
    "Suppose we want to compare 3 simple texts:\n",
    "\n",
    "* `text1`: _She sells seashells by the seashore._\n",
    "* `text2`: _\"Seashells! The seashells are on sale! By the seashore.\"_\n",
    "* `text3`: _Three seashells she sells to John who lives by the lake._\n",
    "\n",
    "Our goal is to determine whether `text1` is more similar to `text2` or to `text3`. We'll start by assigning the texts to 3 variables.\n",
    "\n",
    "**Listing 13. 1. Assigning texts to variables**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "text1 = 'She sells seashells by the seashore.'\n",
    "text2 = '\"Seashells! The seashells are on sale! By the seashore.\"'\n",
    "text3 = 'She sells 3 seashells to John, who lives by the lake.'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we need to quantify the differences between texts. One basic approach is to simply count the words shared between each pair of texts. The approach requires us to split each text into a list of words.\n",
    "\n",
    "**Listing 13. 2. Splitting texts into words**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Words in text 1\n",
      "['She', 'sells', 'seashells', 'by', 'the', 'seashore.']\n",
      "\n",
      "Words in text 2\n",
      "['\"Seashells!', 'The', 'seashells', 'are', 'on', 'sale!', 'By', 'the', 'seashore.\"']\n",
      "\n",
      "Words in text 3\n",
      "['She', 'sells', '3', 'seashells', 'to', 'John,', 'who', 'lives', 'by', 'the', 'lake.']\n",
      "\n"
     ]
    }
   ],
   "source": [
    "words_lists = [text.split() for text in [text1, text2, text3]]\n",
    "words1, words2, words3 = words_lists\n",
    "\n",
    "for i, words in enumerate(words_lists, 1):\n",
    "    print(f\"Words in text {i}\")\n",
    "    print(f\"{words}\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our texts have been split into words. However, an accurate word comparison is not immediately possible, due to present inconsistencies. We can eliminate the capitalization inconsistency by calling the built-in lower string-method. The method converts a string to lowercase. Furthermore, we can strip-out puctutation from a word by calling `word.replace('punctuation', ' ')`, where `punctuation` is set to `'!'` or `'\"'`.\n",
    "\n",
    "**Listing 13. 3. Removing case-sensitivity and punctuation**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Words in text 1\n",
      "['she', 'sells', 'seashells', 'by', 'the', 'seashore']\n",
      "\n",
      "Words in text 2\n",
      "['seashells', 'the', 'seashells', 'are', 'on', 'sale', 'by', 'the', 'seashore']\n",
      "\n",
      "Words in text 3\n",
      "['she', 'sells', '3', 'seashells', 'to', 'john', 'who', 'lives', 'by', 'the', 'lake']\n",
      "\n"
     ]
    }
   ],
   "source": [
    "def simplify_text(text):\n",
    "    for punctuation in ['.', ',', '!', '?', '\"']:\n",
    "        text = text.replace(punctuation, '')\n",
    "    \n",
    "    return text.lower()\n",
    "\n",
    "for i, words in enumerate(words_lists, 1):\n",
    "    for j, word in enumerate(words):\n",
    "        words[j] = simplify_text(word)\n",
    "        \n",
    "    print(f\"Words in text {i}\")\n",
    "    print(f\"{words}\\n\") "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Currently, we're just interested in comparing unique words. We can eliminate all duplicate words by converting each word-list into a set.\n",
    "\n",
    "**Listing 13. 4. Converting word-lists to sets**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Unique Words in text 1\n",
      "{'seashore', 'sells', 'the', 'she', 'seashells', 'by'}\n",
      "\n",
      "Unique Words in text 2\n",
      "{'seashore', 'sale', 'the', 'are', 'seashells', 'on', 'by'}\n",
      "\n",
      "Unique Words in text 3\n",
      "{'3', 'sells', 'who', 'lake', 'the', 'she', 'to', 'lives', 'seashells', 'by', 'john'}\n",
      "\n"
     ]
    }
   ],
   "source": [
    "words_sets = [set(words) for words in words_lists]\n",
    "for i, unique_words in enumerate(words_sets, 1):\n",
    "    print(f\"Unique Words in text {i}\")\n",
    "    print(f\"{unique_words}\\n\") "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given two Python sets `set_a` and `set_b`, we can extract all overlapping elements by running `set_a & set_b`. Lets leverage the `'&'` operator to count overlapping words between text-pairs `(text1, text2)` and `(text1, text3)`.\n",
    "\n",
    "**Listing 13. 5. Extracting overlapping words between two texts**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Texts 1 and 2 share these 4 words:\n",
      "{'seashore', 'seashells', 'the', 'by'}\n",
      "\n",
      "Texts 1 and 3 share these 5 words:\n",
      "{'sells', 'the', 'she', 'seashells', 'by'}\n",
      "\n"
     ]
    }
   ],
   "source": [
    "words_set1 = words_sets[0]\n",
    "for i, words_set in enumerate(words_sets[1:], 2):\n",
    "    shared_words = words_set1 & words_set\n",
    "    print(f\"Texts 1 and {i} share these {len(shared_words)} words:\")\n",
    "    print(f\"{shared_words}\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets count and print all diverging words between text-pairs `(text1, text2)` and `(text1, text3)`. We'll leverage the `^` operator to extract diverging elements between each pair of word-sets.\n",
    "\n",
    "**Listing 13. 6. Extracting diverging words between two texts**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Texts 1 and 2 don't share these 5 words:\n",
      "{'sells', 'sale', 'she', 'are', 'on'}\n",
      "\n",
      "Texts 1 and 3 don't share these 7 words:\n",
      "{'3', 'seashore', 'who', 'lake', 'to', 'lives', 'john'}\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for i, words_set in enumerate(words_sets[1:], 2):\n",
    "    diverging_words = words_set1 ^ words_set\n",
    "    print(f\"Texts 1 and {i} don't share these {len(diverging_words)} words:\")\n",
    "    print(f\"{diverging_words}\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Imagine if we combined all overlapping words and all diverging words between 2 texts. The combination shoud contain all the unique words across the 2 texts. This aggregation of all unique words is called a **union**. Lets utilize the `|` operator to count the total words across text-pairs `(text1, text2)` and `(text1, text3)`.\n",
    "\n",
    "**Listing 13. 7. Extracting the union of words between two texts**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Together, texts 1 and 2 contain 9 unique words. These words are:\n",
      " {'seashore', 'sells', 'sale', 'the', 'she', 'on', 'are', 'seashells', 'by'}\n",
      "\n",
      "Together, texts 1 and 3 contain 12 unique words. These words are:\n",
      " {'3', 'lake', 'to', 'lives', 'who', 'by', 'john', 'seashore', 'sells', 'the', 'she', 'seashells'}\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for i, words_set in enumerate(words_sets[1:], 2):\n",
    "    total_words = words_set1 | words_set\n",
    "    print(f\"Together, texts 1 and {i} contain {len(total_words)} \" \n",
    "          f\"unique words. These words are:\\n {total_words}\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Together, `text1` and `text3` contain 12 unique words. 5 of these words overlap. 7 of these words diverge. Accordingly, both overlap and divergence represent complementary percentages of the total unique word-count across texts. Lets output these percentages for text-pairs `(text1, text2)` and `(text1, text3)`.\n",
    "\n",
    "**Listing 13. 8. Extracting the percentage of shared words between two texts**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Together, texts 1 and 2 contain 9 unique words. \n",
      "44.44% of these words are shared. \n",
      "55.56% of these words diverge.\n",
      "\n",
      "Together, texts 1 and 3 contain 12 unique words. \n",
      "41.67% of these words are shared. \n",
      "58.33% of these words diverge.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for i, words_set in enumerate(words_sets[1:], 2):\n",
    "    shared_words = words_set1 & words_set\n",
    "    diverging_words = words_set1 ^ words_set\n",
    "    total_words = words_set1 | words_set\n",
    "    assert len(total_words) == len(shared_words) + len(diverging_words)\n",
    "    percent_shared = 100 * len(shared_words) / len(total_words)\n",
    "    percent_diverging = 100 * len(diverging_words) / len(total_words)\n",
    "    \n",
    "    print(f\"Together, texts 1 and {i} contain {len(total_words)} \" \n",
    "          f\"unique words. \\n{percent_shared:.2f}% of these words are \"\n",
    "          f\"shared. \\n{percent_diverging:.2f}% of these words diverge.\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The percentage of shared words is a similarity metric which is called as the **Jaccard similarity**. \n",
    "\n",
    "### 13.1.1. Introduction to the Jaccard Similarity\n",
    "Lets define a function to compute the Jaccard similarity. \n",
    "\n",
    "**Listing 13. 9. Computing the Jaccard similarity**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The Jaccard similarity between 'She sells seashells by the seashore.' and '\"Seashells! The seashells are on sale! By the seashore.\"' equals 0.4444.\n",
      "\n",
      "The Jaccard similarity between 'She sells seashells by the seashore.' and 'She sells 3 seashells to John, who lives by the lake.' equals 0.4167.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "def jaccard_similarity(text_a, text_b):\n",
    "    word_set_a, word_set_b = [set(simplify_text(text).split())\n",
    "                              for text in [text_a, text_b]]\n",
    "    num_shared = len(word_set_a & word_set_b)\n",
    "    num_total = len(word_set_a | word_set_b)\n",
    "    return num_shared / num_total\n",
    "\n",
    "for text in [text2, text3]:\n",
    "    similarity = jaccard_similarity(text1, text)\n",
    "    print(f\"The Jaccard similarity between '{text1}' and '{text}' \"\n",
    "          f\"equals {similarity:.4f}.\" \"\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our implementation of the Jaccard similarity is functional, but not very efficient. We can replace the union-computation with `len(word_set_a) + len(word_set_b) - num_shared`. This will make our function more efficient.  Lets modify the function, while ensuring that our Jaccard output remains the same.\n",
    "\n",
    "**Listing 13. 10. Efficiently computing the Jaccard similarity**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "def jaccard_similarity_efficient(text_a, text_b):\n",
    "    word_set_a, word_set_b = [set(simplify_text(text).split())\n",
    "                              for text in [text_a, text_b]]\n",
    "    num_shared = len(word_set_a & word_set_b)\n",
    "    num_total = len(word_set_a) + len(word_set_b) -  num_shared\n",
    "    return num_shared / num_total\n",
    "    \n",
    "for text in [text2, text3]:\n",
    "    similarity = jaccard_similarity_efficient(text1, text)\n",
    "    assert similarity == jaccard_similarity(text1, text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We've improved our Jaccard function. Unfortunately, our function still won't scale. The inefficiency is caused by our remaining set-comparison, `word_set_a & word_set_b`. The operation is too slow to execute across thousands of complicated texts. Perhaps we can speed-up the computation by somehow running it using NumPy.\n",
    "\n",
    "### 13.1.2. Replacing Words with Numeric Values\n",
    "Can we swap-out words for numbers? Yes! We simply need to iterate over all words in all texts, and assign each unique ith word a value of `i`. The mapping between words and their numeric values can be stored within a Python dictionary.  We'll refer to this dictionary as our **vocabulary**.\n",
    "\n",
    "**Listing 13. 11. Assigning words to numbers in a vocabulary**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Our vocabulary contains 15 words. This vocabulary is:\n",
      "{'3': 0, 'lake': 1, 'to': 2, 'are': 3, 'lives': 4, 'who': 5, 'by': 6, 'john': 7, 'seashore': 8, 'sells': 9, 'sale': 10, 'the': 11, 'she': 12, 'seashells': 13, 'on': 14}\n"
     ]
    }
   ],
   "source": [
    "words_set1, words_set2, words_set3 = words_sets\n",
    "total_words = words_set1 | words_set2 | words_set3\n",
    "vocabulary = {word : i for i, word in enumerate(total_words)}\n",
    "value_to_word = {value: word for word, value in vocabulary.items()}\n",
    "print(f\"Our vocabulary contains {len(vocabulary)} words. \" \n",
    "      f\"This vocabulary is:\\n{vocabulary}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given our vocabulary, we can convert any text into a 1-dimensional array of numbers. Mathematically, a 1D numeric array is called **vector**. There are numerous ways of vectorizing text. One basic approach involves creating a vector whose elements are binary. Lets use binary vectorization to convert all texts into NumPy arrays.\n",
    "\n",
    "**Listing 13. 12. Transforming words into binary vectors**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXEAAAEhCAYAAACJCZBTAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAjvklEQVR4nO3de5wddX3/8dd7dxMaCgISSCIkKdKAoFEE5NIiVZR7MVVUQCxqjREVa+XXCrVCAOUnSOuFS9muyEVLQREUTKIIKImCQAjGQLhG1CRggoSKgNFc+PSPmY2HZWd35uzsnpmz72ce88iZc+Z8zmdmdz/zPd/5zowiAjMzq6eOVidgZmbNcxE3M6sxF3EzsxpzETczqzEXcTOzGnMRNzOrMRdxM7MRIOlSSU9Iui/jdUk6X9IySUsk7Zknrou4mdnIuBw4bIDXDwempdMs4OI8QV3EzcxGQEQsAJ4aYJEZwFcjcQewtaRJg8V1ETczq4YdgBUN8yvT5wbUNWzpDJuHfZ2AGhg3ZXYpcdYuP7OUOGXlU5Z2Xa8ylLVtErtoKO8eN+W43PXmDyuu/iBJN0ivnojoKfBx/eU66OfXsIibmY0MKX9nRVqwixTtvlYCkxvmdwQeH+xN7k4xM8sgOnJPJbgBOCEdpbIf8HRE/HqwN7klbmaWoUhLfPBYugp4AzBe0kpgNjAGICK6gXnAEcAy4PfA+/LEdRE3M8tQZhGPiOMGeT2AjxSN6yJuZpZBGtJx0RHhIm5mlqn6hw1dxM3MMnR0VL9EVj9DM7MWKWnUybByETczy1Dmgc3h4iJuZpbBRdzMrMZcxM3MaqxDna1OYVAu4mZmGdwSNzOrMRdxM7NacxE3M6stt8TNzGrMRdzMrMY6VP0SWf0MzcxaxFcxNDOrMXenmJnVmC+AZWZWY26Jm5nVmIu4mVmNyaNTzMzqyy1xM7Ma8xBDM7Ma8+gUM7Mac3eKmVmddbo7xcysvmrQJ1797woFLFiwiEMPPZGDD55FT881jlPRXAC6z/sgv7qnm7tv+lzTMcrKp6xcyooD7bleVfqZ5ybln1pkwCIuaVtJi9NplaTHGubH5vkASZ8c4LWzJa2Q9GzRxPvauHEjZ53VzSWXnMHcuRcxZ84Cli1b7jgVy6XX166Zz4wTzmnqvWXnU0YuZcZp1/Wq0s88t44CU4sM+NERsSYi9oiIPYBu4Au98xGxLudnZBZx4DvAPjnjDGjJkkeYOnUSkydPZOzYMRx55IHccsudjlOxXHrddteDPPXboe27y8qnjFzKjNOu61Wln3leIeWeWqXw/kPSXpLmS1ok6UZJkyRtJekhSbumy1wl6QOSzgHGpS33K/vGiog7IuLXJawHq1evYeLE8ZvmJ0zYltWr1zhOxXIpU9XyKUu7rlcZRnzbqMDUIkUPbAq4AJgREb+RdAxwdkT8g6STgMslfQnYJiK+DCDppLQlP6wi4sXJNrF3bMc4VcqlTFXLpyztul5lGPFt01n9w4ZFM9wMeBVwk6TFwKeAHQEi4ibgXuAiYGaJOSJplqS7Jd3d0/P1fpeZOHE8q1Y9uWl+9eo1bL/9Swt/VjvGqVIuZapaPmVp1/Uqw4hvmxq0xIsWcQFLG/rFp0fEIQBKRsXvBqwFSt2qEdETEXtHxN6zZh3T7zLTp0/jl798nBUrVrFu3Xrmzl3AQQcV725vxzhVyqVMVcunLO26XmUY8W3TofxTixTtTvkjsJ2k/SPiJ5LGALtExFLg48ADJAcyL02XWQ+slzQmfTxsuro6Of30E5k5czYbNz7P0Ue/mWnTpjpOxXLpdcUFH+X1++/G+G22ZNmdF/Lpz3+TK75+a0vyKSOXMuO063pV6WeeWw26sdRfH1O/C0pnAM8CNwPnA1uR7AS+CMwHrgf2iYhnJH0eeCYiZks6F3gLcE9EHN8n5ueAdwEvAx4HLomIMwbO5OF8CVtLjZsyu5Q4a5efWUqcsvIpS7uuVxnK2jaJXYZUhacd8pXc9eaR77+/JRU/d0u8T3E9sJ9FdmtY9uSGx6cAp2TE/ATwibw5mJmNqJK7SSQdBnwJ6CRptJ7T5/WtgP8GppDU53+PiMsGiunT7s3MMkSJRVxSJ8nAj4OBlcBCSTdExP0Ni30EuD8ijpK0HfCQpCsHOi+n+uNnzMxapdwDm/sAyyLi0bQoXw3M6LNMAFsqGTe5BfAUsGGgoG6Jm5llKbc3ZQdgRcP8SmDfPstcCNxAcoxwS+CYiHh+oKBuiZuZZSlwAazG81nSaVbfaP18Qt8Dp4cCi0kGe+wBXCjpJQOl6Ja4mVmWAn3iEdED9AywyEpgcsP8jiQt7kbvA86JZNjgMkm/AF4B3JWZYu4MzcxGm3L7xBcC0yTtlF4F9liSrpNGy4E3AUiaAOwKPDpQULfEzcyylHiyT0RsSK8xdSPJEMNLI2KppBPT17uBT5Ncg+peku6XUyLiycyguIibmWUrua8iIuYB8/o8193w+HHgkCIxXcTNzLLU4LR7F3EzsyzVr+Eu4mZmWco8Y3O4uIibmWVxETczqzEXcTOzGvOBTTOzGqt+DXcRNzPL5O4UM7MacxE3M6uv6HQRNzOrLx/YNDOrMXenmJnVWA0u1u0ibmaWxd0pZmb1FZ3Vb4q7iJuZZal+DXcRNzPL5AObZmY15j5xM7Mac0vczKzGql/DXcTNzLJEV/WPbLqIm5llcZ+4mVmNVb8h7iJeBeOmzC4lztrlZ5YSx7KVtY3L+pmXoWrrVOa2Wbv8qqEFcEvczKzGPDrFzKzGXMTNzOrLN4UwM6sz94mbmdWYu1PMzGrMRdzMrMaqX8NdxM3MsvimEGZmdebuFDOzGqt+Da/DlQHMzFqjoyP/lIekwyQ9JGmZpFMzlnmDpMWSlkqaP1hMt8TNzDKUOUxcUidwEXAwsBJYKOmGiLi/YZmtgf8EDouI5ZK2HyyuW+JmZhmk/FMO+wDLIuLRiFgHXA3M6LPMu4DrImI5QEQ8MVhQF3EzswwdHco95bADsKJhfmX6XKNdgG0k3SppkaQTBgvq7hQzswxFulMkzQJmNTzVExE9jYv087boM98F7AW8CRgH/ETSHRHxcNbnuoibmWVQgb6KtGD3DLDISmByw/yOwOP9LPNkRDwHPCdpAfAaILOIuzvFzCxDyX3iC4FpknaSNBY4FrihzzLXA6+X1CVpc2Bf4IGBgrolbmaWocxzfSJig6STgBuBTuDSiFgq6cT09e6IeEDS94AlwPPAJRFx30BxXcTNzDLkHf+dV0TMA+b1ea67z/x5wHl5Y7qIm5llkK8nbmZWX0UObLaKi7iZWYYaNMRdxM3MsriIm5nVmIu4mVmN1eCeEC7iZmZZ3BI3M6sx+c4+Zmb15Za4mVmNuYibmdWYi7iZWY3VYXRKDVLMb8GCRRx66IkcfPAsenquaYs43ed9kF/d083dN32u6TzKyqXMOFVar7JyqVo+VdrGZeZT5s9rMOrIP7XKgB8tadv0rsuLJa2S9FjD/Ng8HyDpkxnPby5prqQH07s6n9PMCvTauHEjZ53VzSWXnMHcuRcxZ84Cli1bXvs4X7tmPjNOGNKmqdw6QbXWq4xcqphPlbZxWfmUGSePkq8nPiwGLOIRsSYi9oiIPYBu4Au98+mNPvPot4in/j0iXgG8FvhrSYfnjPkiS5Y8wtSpk5g8eSJjx47hyCMP5JZb7qx9nNvuepCnfvts4fcNRy5lxYFqrVcZuVQxnypt47LyKTNOHpJyT61S+EuApL0kzU9v4nmjpEmStpL0kKRd02WukvSBtHU9Lm25X9kYJyJ+HxE/TB+vA+4huV1RU1avXsPEieM3zU+YsC2rV6+pfZwytOM6OZ+R0Y7rVEQdWuJFD2wKuACYERG/kXQMcHZE/EN6x4rLJX0J2CYivgwg6aS0JZ8dVNoaOAr4UtEV6BXR936jzV0LuGpxytCO6wTOZyS04zoVUfZNIYZD0RQ3A14F3CRpMfAp0tZzRNwE3AtcBMzMG1BSF3AVcH5EPJqxzCxJd0u6u6fn6/3GmThxPKtWPblpfvXqNWy//UvzplHZOGVox3VyPiOjHdepiA7ln1qWY8HlBSxt6BefHhGHAEjqAHYD1gJFfso9wCMR8cWsBSKiJyL2joi9Z806pt9lpk+fxi9/+TgrVqxi3br1zJ27gIMO2qdAGtWMU4Z2XCfnMzLacZ2KqEMRL9qd8kdgO0n7R8RPJI0BdomIpcDHSe7K/Eng0nSZ9cB6SWPSxy8g6TPAVhRouWfp6urk9NNPZObM2Wzc+DxHH/1mpk2bWvs4V1zwUV6//26M32ZLlt15IZ/+/De54uu3tiSXsuJAtdarjFyqmE+VtnFZ+ZQZJ48Ovbg7qWrUX59XvwtKZwDPAjcD55MU3y7gi8B84Hpgn4h4RtLngWciYrakc4G3APdExPEN8XYEVgAPkuwcAC6MiEsGzuTh6m/VgsZNmV1KnLXLzywlThmqtk7tmk8Z2nGdeq1dftWQ2siHf//HuevNdw85oCXt8dwt8Yg4o2H2wH4W2a1h2ZMbHp8CnNJPvJUk3TNmZpVUg+OaPu3ezCxLV0f1v/i7iJuZZXBL3MysxmpwTwgXcTOzLKrB6BQXcTOzDG6Jm5nVmPvEzcxqzKNTzMxqzC1xM7Mac5+4mVmN1eHaKS7iZmYZ3BI3M6uxLrfEzczqqw4t8TocfDUza4mybwoh6bD0fsTLJJ06wHKvk7RR0tsHzTH/6piZjS4dBabBSOokuX3l4cDuwHGSds9Y7lzgxrw5mplZPzoUuacc9gGWRcSjEbEOuBqY0c9yHwWuBZ7IlWPelTEzG21K7k7ZgeRuZr1Wps9tImkH4K1Ad94cfWDTzCxDV4EDm5JmAbManuqJiJ7GRfp5W98m/BeBUyJio5Tvw13EzcwyFLkUbVqwewZYZCUwuWF+R+DxPsvsDVydFvDxwBGSNkTEt7OCuoibmWUoeYjhQmCapJ2Ax4BjgXc1LhARO/U+lnQ5MGegAg4u4mZmmco8aBgRGySdRDLqpBO4NCKWSjoxfT13P3gjF3EzswxlXzslIuYB8/o812/xjoj35onpIm5mlqEOZ2y6iJuZZRjjIm5mVl++FK2ZWY25O8XMrMZcxM3MaqzTRdzMrL58t3szsxpzd4qZWY11tjqBHGpXxMdNmd3qFEq3dvmZpcRpx21T1jpVbRtXKZ+q/d6UtW3K4Ja4mVmNeZy4mVmNeXSKmVmNddXg3mcu4mZmGdwnbmZWY53uEzczq68a9Ka4iJuZZXF3iplZjY3xafdmZvXllriZWY25iJuZ1ZiLuJlZjfmMTTOzGvO1U8zMaqzLLXEzs/pyd4qZWY25O8XMrMY8OsXMrMZcxM3MaswXwDIzqzG3xM3Makwu4mZm9eXuFDOzGpOHGJqZ1VcNelNq8W3BzKwlOpR/ykPSYZIekrRM0qn9vH68pCXpdLuk1wwW0y1xM7MMZY5OkdQJXAQcDKwEFkq6ISLub1jsF8DfRMT/Sjoc6AH2HTDH8lI0M2svKjDlsA+wLCIejYh1wNXAjMYFIuL2iPjfdPYOYMfBgrqIm5llkPJPOewArGiYX5k+l+X9wHcHC+ruFDOzDEV6UyTNAmY1PNUTET2DhOt3+IukN5IU8QMG+1wXcTOzDEWKeFqwewZYZCUwuWF+R+DxF32m9GrgEuDwiFgz2Oe2VXdK93kf5Ff3dHP3TZ9raYwy4wAsWLCIQw89kYMPnkVPzzUtzadd41RpG1ctn6rFKWPb5NWp/FMOC4FpknaSNBY4FrihcQFJU4DrgL+PiIfzBB2wiEvaVtLidFol6bGG+bF5PkDSJwd47XuSfiZpqaTu9Oht0752zXxmnHDOUEKUEqPMOBs3buSss7q55JIzmDv3IubMWcCyZctblk87xqnaNq5aPlWKU9a2yUuK3NNgImIDcBJwI/AA8I2IWCrpREknpoudDmwL/GdaZ+8eLO6ARTwi1kTEHhGxB9ANfKF3Pj26mkdmEQfeGRGvAV4FbAe8I2fMft1214M89dtnhxKilBhlxlmy5BGmTp3E5MkTGTt2DEceeSC33HJny/JpxzhV28ZVy6dKccraNnmVPDqFiJgXEbtExM4RcXb6XHdEdKePZ0bENg11du/BYhbuTpG0l6T5khZJulHSJElbpQPYd02XuUrSBySdA4xL9yhX9rNCv0sfdgFjyejkH81Wr17DxInjN81PmLAtq1cP2k1mBVRtG1ctnyoZ6W1T8uiUYVG0iAu4AHh7ROwFXAqcHRFPk3xNuFzSscA2EfHliDgVWJvuUY7vN6B0I/AE8AzwzWZXpF1FvHi/pjpcWq1GqraNq5ZPlYz0tukoMLVK0c/ejKTr4yZJi4FPkQ5Gj4ibgHtJzkiamTdgRBwKTEpjH9TfMpJmSbpb0t0bnl1WMOV6mzhxPKtWPblpfvXqNWy//UtbmFH7qdo2rlo+VTLS26ZdW+JLG/prpkfEIQCSOoDdgLVAoa0aEX8gOUo7I+P1nojYOyL27triLwumXG/Tp0/jl798nBUrVrFu3Xrmzl3AQQft0+q02krVtnHV8qmSkd42ZV87ZVhyLLj8H4HtJO0PIGmMpFemr32c5IjrccClksakz69veLyJpC0kTUofdwFHAA82sQ6bXHHBR7n122exy8snsezOC3nPMW9oSYwy43R1dXL66Scyc+Zsjjjiwxx++AFMmza1Zfm0Y5yqbeOq5VOlOGVtm7zKPrA5HNRfH1O/C0pnAM8CNwPnA1uRHJD8IjAfuB7YJyKekfR54JmImC3pXOAtwD2N/eKSJgBzSLpROoEfAB9Ph+FkGjfluLY7+Ll2+ZmlxBk3ZXYpcdpR1bZx1fKpkrK2TWKXIdXXx3//ndz15mWbH9WSWp77jM2IOKNh9sB+FtmtYdmTGx6fApzST7zVwOvyfr6Z2Uirw+Fkn3ZvZpbBd/YxM6sx3+3ezKzGalDDXcTNzLLU4QqBLuJmZhnqcKKsi7iZWabqV3EXcTOzDHIRNzOrryHe4mBEuIibmWVwS9zMrNZcxM3Maiu5OGu1uYibmWVyS9zMrLbcJ25mVmPCo1PMzGqrDvc2dRE3M8vkIm5mVlvuEzczqzUPMTQzq60OjxM3M6szd6eYmdWW3J1iZlZnbombmdWWx4mbmdWai7iZWW35tHszsxrzyT5mZjXmPnEzs1qr/hDD6mdoZtYiKvAvVzzpMEkPSVom6dR+Xpek89PXl0jac7CYbombmWUo8/ZskjqBi4CDgZXAQkk3RMT9DYsdDkxLp32Bi9P/M7klbmaWqaPANKh9gGUR8WhErAOuBmb0WWYG8NVI3AFsLWnSYBmamVk/Su5O2QFY0TC/Mn2u6DIvFBFtNwGzqhDDceoVp0q5OM7IxSlrAmYBdzdMs/q8/g7gkob5vwcu6LPMXOCAhvlbgL0G+tx2bYnPqkgMx6lXnCrl4jgjF6cUEdETEXs3TD19FlkJTG6Y3xF4vIllXqBdi7iZWdUsBKZJ2knSWOBY4IY+y9wAnJCOUtkPeDoifj1QUI9OMTMbARGxQdJJwI1AJ3BpRCyVdGL6ejcwDzgCWAb8HnjfYHHbtYj3/RrTqhiOU684VcrFcUYuzoiJiHkkhbrxue6GxwF8pEhMpZ3nZmZWQ+4TNzOrMRdxM7Maq32fuKR9SLqSFkraHTgMeDDte7I2JGkX4F+AqTT8DkfEQS1Lqk1JGgdMiYiHWp2L9a/WLXFJs4HzgYslfRa4ENgCOFXSvzURbxdJt0i6L51/taRPNZnbBEl/m07bNxlDkt4t6fR0fkq60yoaZ2dJm6WP3yDpHyVtXTDGjpK+Jek3klZLulbSjkVzSWNdK+lINX9himuAe4BPkRTz3qmZXMraxh+T9JI03lck3SPpkCbiTEjf/910fndJ7x/pGOn7jgIWA99L5/eQ1HdIXJ4475C0Zfr4U5Kuy3Nhp37ibCbpXZI+Ken03qlonLbT6rOchniG1L0kQ3U2B34HvCR9fhywpIl480mub/DThufuayLOO4FfAVcAXwV+Aby9iTgXk1ww54F0fhtgYRNxFpO0WP8S+DnwBWBewRg3kQx36kqn9wI3NflzezNwZZrLOcArCr5/UYm/Q2Vt45+l/x9KMtb3NcA9TcT5bvr70xuvC7h3pGP0bmdgqz5/D838XS1J/z8A+BHJ9UHubCLO94CvA58A/l/vVNbvQl2nunenbIiIjcDvJf08In4HEBFrJT3fRLzNI+KuPheC39BEnH8DXhcRTwBI2g64GfhmwTj7RsSekn4KEBH/m54kUNTzkYxRfSvwxYi4oDdmAdtFxGUN85dL+qcmciEibgZulrQVcBxwk6QVwJeB/46I9YOE+I6kDwPfAv7YEPepJtIpaxv3/tIcAVwWET9Tc3cUGB8R35D0r2k+GyRtbEEMSP6+ni7hxgi9n30kcHFEXC/pjCbi7BgRhw01mXZT6+4UYJ2kzdPHe/U+mRaHZor4k5J2BiKN83ZgwLOlMnT0FvDUGprb1uvTy1f25rMdza3XeknHAe8B5qTPjSkY48m026Eznd5Nsl5NkbQtSct+JvBT4EvAniQt/sG8h6T75HaS1uIikmtVNKOsbbxI0vdJiviNafdBM3GeS7dNbz77AU+3IAbAfZLeBXRKmibpApJtXtRjkv6L5NvBvLRrr5m/h9slTW/ife2t1V8FhjIBm2U8Px6Y3kS8l5O0mH8PPAb8GJjaRJzPkZyV9d50+i5wbhNxjif5ar4SOBt4CHhHE3F2Jzl2cFw6vxNwasEYU9JcfgM8AXy7mW2TxroOuB/4V2Bin9fuHuHfobK2cQfJTmjrdH5b4NVNxNkTuI2k6N4GPFw0Thkx0jibp9tkIclO8mzgz5qM8zZgWjo/CTikiTj3A+vTn9ESku7Uwt077Tb5ZJ8GkvaKiEWS/pykNf2MpKMi4jsF45wL3EnSByhgAbBfRJxSIEYHsB/wFPCmNM4tEfFAkVwa4lVmlIGkI0h2LH9N0lr9McnX7D8UiPFXwF/wwtEpXy2Yx5C38WAH6CLiniI5pTG7gF3TfB6KwbuXhiXGUEl66UCvR8HuL0lTSY5ZvD59agHw24j4VXMZtgcX8QaS7gHeExH3pvPHAh+PiAHvrNFfnIjYs89zSyLi1QXj/CQi9i/ynow4RwH/DoyNiJ0k7QGcFRFvKRBjO+ADvLhw/kMT+XyD5ED0lelTxwHbRMQ7cr7/a8DOJAdse/tbIyL+sYlchrSNJf1wgJcjcg57lPS2gV6PiOsK5lXGTm4X4J/7iZN3nX5B0qXTX6d6RMTLC+bzMZLut+vSmH8HfDkiLigSp924iDeQ9HKSg4/Hk7SiTwD+NiJy9SdK+hDwYZJumZ83vLQlcFtEvLtgPmeSfG28Lobwg5K0CDgIuDUiXps+d29E5O5flHQ7yciCRfypcBIR1zaRz88i4jWDPTfA+x8Adh/KNmmIVco2LiGPywZ4OYrsLMvayUn6GdDNi3/mi4rEKYukJcD+EfFcOv/nwE+KNo7aTd1Hp5QqIh5NW9/fJrm7xiERsbZAiP8h6f/+LNB4E9Rnin51TJ0M/DmwQdIfSFofEREvKRinv1EGRQvW5kW6gwbxU0n7RXL7KSTtS9J3m9d9wESaO+jcV+823phuYyiwjctqQUfEoFerK2BvytnJbYiIi5t98zB0NYmGnUn6eMhDZ+rORZykVcoLi9pLScaf3ymJvHv6tMX+NEn3wJBFxJZpv+I04M+GEOoFowyAf6T4KIM5ko6IIZwJ27Cdx5BcM3l5Oj+V5KDVYO//Trr8lsD9ku7ihUMMc3cPNbxny6Lv6eOogcKTfPUvRNKRwCtp+JlHxFkFQgxpJ9fQlz3UoZz/McBrQfLtsIjLSP4mv5XO/x3wlYIx2o67U9h0wCRTqw6cSJoJfIzk7h6LSQ7C3R4RbyoYZ3OSseu9ZxDeCHwmz4FESc/wpx3cFiR/zL1j5wt9Kxjqdpb0N4O8f37eXPrEfQtwYDp7a0TMGWj54SSpm2Q0xxuBS4C3A3dFxKBnXPbZye0BNLWTy+jL3lQoivZllylt3W8aMBARRc93aDsu4v1Qcpp8YytoeYvyuBd4HXBHROwh6RXAmRFxTME4rx3qL3vaz/oj4EfNjpApk6QJJNsGkiL3xEDLDxDnnDRO40HWRRFxava7MvP5/8DLIuJwJdfx2T8iCrUUew+AN/y/BUl//aCn8Kc7OQHnkpzVuOklkiGuRQ/QvxP4XkT8TtJpJEMXP120GyRtRJxMMjpqVvptcNdW7izbylDGJ7bbBLwFeAR4juRU+eeBpS3MZ2H6/2LSMfHA4ibi/BB4EPg08MomczkIOJ3kZJyfkxwA/liLtksplzVIYy0hGU7aO99Jc6eWl3Wq+13p/3cALyNpTDxSMMaLTvdvcp0aT5dfQPOny/eeKn9fOj+umd9jT/1PdT9js2yfJumyeDgidiIZO1zkgFvZViq5UNW3SU5Nv55Bbpran4h4I/AGkhN1eiTdq4IX9oqIH5Cc7HEaydf81wEfKppLSXova/CeiDiB5Ho3pw0h3tYNj7dqMsb4iPgG6VmaEbGBFx6Ey+s76c/8PJKLfP0CuCrPGyV9KP32tqukJQ3TL0h2VkU1ni7fHRHXA81ckmDniPgcyYk6RDJYYNQfkCyLD2y+0PqIWCOpQ1JHRPwwPXGnJSLirenDM9LxyFuRXlGuiVirgPPTOJ8gaVV/Ju/7Jd1CMorjJyTdKpuuDdMCZV3WAJKRRD9Nt4tI+sb/tYk4ZZ3q/iCwMSKuTbtk9iTZiedR9uio3tPl3wycO4TT5delJ5v1bpudaeirt6Fxn3gDSTeTHPH+LMmp+0+QFKu/amVeQyVpN+AYkoNka4CrgWuLFGFJXyC5Ps0fSb6dLCAZo1tkCGYpJJ0HvJo/tVCPJfnq/4nsdw0YbxLJNwuRdBesaiLGnsAFwKtIRodsR9LFU6gF3NAXfgBJH/t/AJ+Mgv3ZZUj7sg8j6RZ6JN1O0yPi+wXjHExy2eDdge+TnKn73oi4teSURyUX8QbpyQO947GPJ2n5XhkRTV/oqQok3UFS8K6JiMLdMX1ibUFy4ap/JrnuyWYlpNhMHm8jKQa9oxS+PYRYO/DiG0wsKBjjHSSjfiYDRwP7AqdF8YOAP42I1yq5Pv69EfE/vc8ViVM16beU/Uh+XndExJMtTqltuIhbLpJOIrlmxV4kBxUXkIxU+cEI5vDjiDigYdhjY7/q8yTXQDkvIv6zQMxzSb6lLOVPVx2MKDjmvKwWtKQ5JBdfezPJtl5LcrAz19msVVXGjtL65yLOi8ZCv+AlmjtDshIkfSMi3tnPyUy965X7dGVJ/0JSuBelB+0qJ23t3R4RuxZ4z0MkV/gbUh9tWS3osrowqqSsHaX1z0W8jUmaFBG/zjrJJtrw6m+961xg+e+SXHr22SF+blu2oMtQ1o7S+ucibqOSkhscBLADya3UbuGFZzcWvVhU27Wgy1LWjtL65yLextq1m6gMkt4z0OsRccVI5dKuyt5RWv9cxM1SkrYBJhcdFmj9845yZLiI26gm6VaSyy10kVze4DfA/Ig4uYVptS3vKMvn0+5ttNsqIn5Hcg/IyyJiL5KDk1YSSbdKekl6idufAZdJ+nyr82oXLuI22nWlByHfCfiqesPDO8ph5CJuo91ZJGdaLouIhUpu0fdIi3NqN95RDiP3iZvZsEovSXAa8OOI+HC6ozwvIo5ucWptwUXcRjVJfwa8nxffDi33jYnNWsmXorXR7mskl389lKRr5Xig5XcuaifeUQ4v94nbaPeXEXEa8Fw6bvlIYHqLc2o3XyO5cfOhwHySe8Y+09KM2oiLuI1269P/fyvpVSSXH/6L1qXTlryjHEbuTrHRric9AeU04AZgC5K7Hll5+u4oV+EdZWl8YNPMhpWkmcC1JHdjuox0RxkR3S1NrE24iNuoJmkCyU0cXhYRh6f3tdw/Ir7S4tTMcnGfuI12l5Oc7POydP5h4J9alUw7kjRB0lfSS9IiaXdJ7291Xu3CRdxGu/ER8Q3SO86kdy3a2NqU2s7leEc5bFzEbbR7Lr2tWwBI2g94urUptR3vKIeRR6fYaHcyyaiUnSXdBmwHvL21KbUd7yiHkYu4jXY7A4cDk4GjgX3x30XZvKMcRu5OsdHutPQyqduQXB61B7i4tSm1nd4d5V+R9I0/gneUpXERt9Gut2/2SKA7Iq4HxrYwn3bkHeUwchG30e4xSf9Fcq3reZI2w38XZfOOchj5ZB8b1SRtDhwG3BsRj6Q3L5geEd9vcWptQ9Ic4DGSVvhewFrgroh4TUsTaxMu4mY2rLyjHF4u4mZmNea+PzOzGnMRNzOrMRdxM7MacxE3M6sxF3Ezsxr7P3NiN27HGTm5AAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import seaborn as sns\n",
    "\n",
    "vectors = []\n",
    "for i, words_set in enumerate(words_sets, 1):\n",
    "    vector = np.array([0] * len(vocabulary))\n",
    "    for word in words_set:\n",
    "        vector[vocabulary[word]] = 1\n",
    "    vectors.append(vector)\n",
    "    \n",
    "sns.heatmap(vectors, annot=True,  cmap='YlGnBu', \n",
    "            xticklabels=vocabulary.keys(),\n",
    "yticklabels=['Text 1', 'Text 2', 'Text 3'])\n",
    "plt.yticks(rotation=0)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our binary vector representation allows us to extract shared words numerically. Suppose we wish to know whether the word in column `i` is present both in `text1` and `text2`. If the associated vectors are labeled as `vector1` and `vector2`, then the word is present in both texts if `vector1[i] * vector2[i] == 1`.\n",
    "\n",
    "**Listing 13. 13. Finding shared words using vector arithmetic**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "'by' is present in both texts 1 and 2\n",
      "'seashore' is present in both texts 1 and 2\n",
      "'the' is present in both texts 1 and 2\n",
      "'seashells' is present in both texts 1 and 2\n"
     ]
    }
   ],
   "source": [
    "vector1, vector2 = vectors[:2]\n",
    "for i in range(len(vocabulary)):\n",
    "    if vector1[i] * vector2[i]:\n",
    "        shared_word = value_to_word[i]\n",
    "        print(f\"'{shared_word}' is present in both texts 1 and 2\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our shared word-count is equal to the sum of every non-zero instance of `vector1[i] * vector2[i]`. Meanwhile, the sum of every zero-instance equals 0. Therefore, we can compute the shared-word count merely by summing the pairwise product of `vector1[i]` and `vector2[i]`, across every possible `i`.\n",
    "\n",
    "**Listing 13. 14. Counting shared words using vector arithmetic**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "shared_word_count = sum(vector1[i] * vector2[i] \n",
    "                        for i in range(len(vocabulary)))\n",
    "assert shared_word_count == len(words_set1 & words_set2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The sum of the pairwise products across all vector indices is called the **dot product**. Given two NumPy arrays, `vector_a` and `vector_b`, we can compute their dot product by running `vector_a.dot(vector_b)`. We can also compute the dot product using the `@` operator, by running `vector_a @ vector_b`.\n",
    "\n",
    "**Listing 13. 15. Computing a vector dot product using NumPy**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "assert vector1.dot(vector2) == shared_word_count\n",
    "assert vector1 @ vector2 == shared_word_count"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The dot product of `vector1` and `vector2` equals the shared word-count between `text1` and `text2`. Suppose instead, we take the dot product of `vector1` with itself. That dot product should equal `len(words_set1)`.\n",
    "\n",
    "**Listing 13. 16. Counting total words using vector arithmetic**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "assert vector1 @ vector1 == len(words_set1)\n",
    "assert vector2 @ vector2 == len(words_set2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are able to compute both shared-word count and total unique word-count using vector dot products. Essentially, we can compute the Jaccard similarity using only vector operations. This vectorized implementation of Jaccard is called the **Tanimoto similarity**. Lets define a `tanimoto_similarity` function.\n",
    "\n",
    "**Listing 13. 17. Computing text similarity using vector arithmetic**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "def tanimoto_similarity(vector_a, vector_b):\n",
    "    num_shared = vector_a @ vector_b\n",
    "    num_total = vector_a @ vector_a + vector_b @ vector_b - num_shared\n",
    "    return num_shared / num_total\n",
    "\n",
    "for i, text in enumerate([text2, text3], 1):\n",
    "    similarity = tanimoto_similarity(vector1, vectors[i])\n",
    "    assert similarity == jaccard_similarity(text1, text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our `tanimoto_similarity` function was intended to compare binary vectors. Can it also meaningfully compare non-binary vectors? Lets find out. \n",
    "\n",
    "**Listing 13. 18. Computing the similarity of non-binary vectors**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The similarity of 2 non-binary vectors is 0.96875\n"
     ]
    }
   ],
   "source": [
    "non_binary_vector1 = np.array([5, 3])\n",
    "non_binary_vector2 = np.array([5, 2])\n",
    "similarity = tanimoto_similarity(non_binary_vector1, non_binary_vector2)\n",
    "print(f\"The similarity of 2 non-binary vectors is {similarity}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The outputted value is nearly equal to 1. Thus, `tanimoto_similarity` has successfully measured the similarity between 2 nearly identical vectors. The function is able to analyze non-binary inputs. \n",
    "\n",
    "## 13.2. Vectorizing Texts Using Word Counts\n",
    "\n",
    "\n",
    "A vector of word-counts is commonly referred to as **term-frequency vector**, or a **TF vector** for short. Word-counts can provide a differentiating signal between texts. For example, suppose we're contrasting 2 texts; A and B. Text A mentions _Duck_ 61 times and _Goose_ twice. Text B mentions _Goose_  71 times and _Duck_ only once. Lets actually compute the TF vectors of A and B, using a 2-element vocabulary `{'duck': 0, 'goose': 1}`. Given the vocabulary, we can convert the texts into TF vectors `[61, 2]` and `[1, 71]`. Below, we'll print the Tanimoto similarity of the 2 vectors.\n",
    "\n",
    "**Listing 13. 19. Computing TF vector similarity**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The similarity between texts is approximately 0.024\n"
     ]
    }
   ],
   "source": [
    "similarity = tanimoto_similarity(np.array([61, 2]), np.array([1, 71]))\n",
    "print(f\"The similarity between texts is approximately {similarity:.3f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The TF vector similarity between the texts is very low. Its less than .025. Lets compare this to the binary-vector similarity of the 2 texts. Each text has a binary-vector representation of `[1, 1]`. Thus, the similarity of the 2 identical vectors should equal 1.\n",
    "\n",
    "**Listing 13. 20. Assessing identical vector similarity**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "assert tanimoto_similarity(np.array([1, 1]), np.array([1, 1])) == 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Replacing binary values with word-counts can greatly impact our similarity output. What will happen if we vectorize `text1`, `text2`, and `text3` based on their word-counts? Lets find out.\n",
    "\n",
    "**Listing 13. 21. Computing TF vectors from word lists**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXMAAAEhCAYAAACN/EBuAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO3de7wcdX3/8df7nBDkGk2ICZcQEBIKJRoxICKI4U5SUYsgYCta+KVpa6vWirSFwC9oxaL1p1UMh2ulogTBGkm4aAxGQS4JhABBSKo0BJoEonITzYXP74+ZEzaHc3Zm9kzO7s55P/OYR3Z2Zz/72dnZz/nud74zo4jAzMzaW0ezEzAzs/5zMTczqwAXczOzCnAxNzOrABdzM7MKcDE3M6sAF3MzswEkaYykBZKWSXpE0sd7WUaSvipphaSlkg7Kijtk66RrZmZ92Ah8KiLul7QTsFjSDyNiWc0yJwLj0untwDfS//vklrmZ2QCKiP+NiPvT2y8AjwK791jsvcA3I3E38HpJu9aL62JuZtYkkvYC3grc0+Oh3YEna+ZX8dqCv4U27mZ5vN/nIZg875kyEqmkBVNGNjuFllbWtlPGem6lXFrTePU3wnZ7np673vz+ye/8JTCt5q6uiOjquZykHYEbgU9ExPP9zbGNi7mZ2cCQ8ndipIX7NcV7y3jahqSQfysibuplkaeAMTXze6T39cndLGZmGURH7ikzliTgSuDRiPi3PhabA3w4HdVyKPBcRPxvvbhumZuZZSjSMs/hncCfAw9JWpLe90/AngARMQuYB0wBVgC/Az6aFdTF3MwsQ5nFPCJ+BtTtx4/k3OR/UySui7mZWYakZ6S1uZibmWVq/d2LLuZmZhk6Olq/VLZ+hmZmTZZnlEqzuZibmWUoeTTLVuFibmaWwcXczKwCXMzNzCqgQ53NTiGTi7mZWQa3zM3MKsDF3MysElzMzczanlvmZmYV4GJuZlYBHWr9Utn6GZqZNZnPmmhmVgHuZjEzqwCfaMvMrALcMjczqwAXczOzCpBHs5iZtT+3zM3MKsBDE83MKsCjWczMKsDdLGZmVdDpbhYzs/bXBn3mrf/boQELFy7m+OOnc+yx0+jquqGhGOdM2Jebjj6Yq46Y2K9cqhqnjHVc1ThlreNWy6eV1nGZcXKR8k9NUreYSxohaUk6rZb0VM380LwvImm4pOl1Hv8PSc9IWlIk+d5s2rSJmTNnccUVFzJ37te5+eaFrFixsnCcW1et5TP3LetvOpWMU9Y6rmqcsj6rVsqn1dZxWXFy6ygwNUndl46IdRExMSImArOAL3fPR8T6Aq8zHOizmANXAVMLxOvT0qXLGTt2V8aMGc3Qodswdeq7mD//nuJxfvM8z2/Y2P98KhintHVc1ThlfVYtlE/LreOS4uQVUu6pWRr+OyLpTEn3pq30SyV1SNpb0vK0Jd4p6S5JRwEXA/uly17cM1ZE/AT4dT/ex2Zr1qxj9OhdNs+PGjWCNWvWlRHaUmWt46rGKUsr5dNq63jA140KTE3S0A5QSQcC7wcOi4iNkrqA0yLiOklfAi4FHgQeiIgfS1oJ7Ju28M3M2ktn6+9ebDTDY4CDgUVpP/eRwD4AETELGAl8FDinjCS7SZomaZGkRV1d1/e6zKhRI1i9+tnN82vWrGPUqBFlpjHolbWOqxqnLK2UT6ut4wFfN23QMm+0mAu4qqb/fL+IuAhA0o7AbkAnsGNJeQIQEV0RMSkiJk2b9sFel5kwYRxPPPE0Tz65mvXrNzB37kKOOuqQMtMY9Mpax1WNU5ZWyqfV1vGAr5sO5Z+apNFx5j8CvivpKxHxrKQRwA4RsRK4BLgaWANcBrwPeAHYqYyEswwZ0smMGdM5++wL2LTpFU4++RjGjRtbOM55E8czcfgwhg0dwuzJk7hm+UrmrVrrOJS3jqsap6zPqpXyabV1XFac3NpgnLkiIt+C0oXAixHxxXT+DJJulA5gA8lolZ2Bi4AjImKTpDnADRFxraTZwP7A3Ig4t0fsG4DDgRHAWuC8iLimfkaP50u8jsnznulviMpaMGVks1NoaWVtO2Ws51bKpTWN73clHnfclbnrzfLbz2pK5c/dMo+IC3vMXwdc18ui82uWOanm9ql1Yp+SNw8zswFXYveJpKuAPwHWRsSBfSzzbuD/AdsAz0bEkZkplpahmVlFRYdyTzlcA5zQ14OSXk8yIvCkiPhjIFdj1+dmMTPLUmLLPCIWStqrziJnADel+yCJiFw7ONwyNzPLMrBDE8cDb5B0h6TFkj6c50lumZuZZSkwmkXSNGBazV1dEdFV4NWGAG8Djga2A34u6e6IeDzrSWZmVk+Bbpa0cBcp3j2tAtZFxEvAS5IWAm8B6hZzd7OYmWUZ2IOGvg8cLmmIpO2BtwOPZj3JLXMzsywlHjQk6dvAu4FdJK0CLiAZgkhEzIqIRyXdCiwFXgGuiIiHs+K6mJuZZSmxDyMiTs+xzCUkR9Pn5mJuZpalDQ7ndzE3M8vS+rXcxdzMLEvOIzubysXczCyLi7mZWQW4mJuZVYB3gJqZVUDr13IXczOzTO5mMTOrABdzM7P2F50u5mZm7c87QM3MKsDdLGZmFdAGJwt3MTczy+JuFjOz9hedrd80dzE3M8vS+rXcxdzMLJN3gJqZVYD7zM3MKsAtczOzCmj9Wu5ibmaWJYa0/h5QF3MzsyzuMzczq4DWb5i7mJdhwZSRpcTZbs8L+h3j0FkfKyETy1LWZz553jP9jnH39K+VkAlMLmnbaaV1A7Bgyvj+B3HL3MysAjyaxcysAlzMzczany9OYWZWBe4zNzOrAHezmJlVgIu5mVkFtH4tdzE3M8vii1OYmVWBu1nMzCqg9Wt5O5xxwMysuTo68k9ZJF0laa2kh/t4/EOSlkp6SNJdkt6SK8dib8nMbPCR8k85XAOcUOfxXwFHRsQE4CKgK09Qd7OYmWUo85ihiFgoaa86j99VM3s3sEeeuC7mZmYZOpq3A/Qs4JY8C7qYm5llKNIylzQNmFZzV1dE5Ooq6RFnMkkxPzzP8i7mZmYZVGDvYlq4CxfvLV5PejNwBXBiRKzL8xwXczOzDAN5ni1JewI3AX8eEY/nfZ6LuZlZhjK7zCV9G3g3sIukVcAFwDYAETELmAGMAC5V8ldkY0RMyorrYm5mliHP+PG8IuL0jMfPBs4uGtfF3Mwsg3w+czOz9ldkB2izuJibmWVog4a5i7mZWRYXczOzCnAxNzOrgDa4NoWLuZlZFrfMzcwqQL7SkJlZ+3PL3MysAlzMzcwqwMXczKwC2mE0SxukWNzChYs5/vjpHHvsNLq6bmgoxjkT9uWmow/mqiMmtkQ+e+w6nFu/cx73z7+ExT+6hL/5i3qXEOxbWe+rjPfkOPWV8VmVtd2UlQ+0zropQh35p2ap+9KSRkhakk6rJT1VMz8074tIGi5peh+PjZV0h6Rlkh6R9LGib6LWpk2bmDlzFldccSFz536dm29eyIoVKwvHuXXVWj5z37L+pFJqPhs3vcK5n/1PDjr60xz53vP5yw8fxx+N271wnDLeV1nvyXHqK+OzKmu7KSufVlo3RZR8Qeetom4xj4h1ETExIiYCs4Avd89HxPoCrzMc6LWYAxuAT0TEAcA7gE9KGl8g9haWLl3O2LG7MmbMaIYO3YapU9/F/Pn3FI/zm+d5fsPGRtMoPZ/Va3/LkoefAODFl37PL1Y8xW6jhxfPp4T3Vdo6dpz6cUr4rMrabsrKp5XWTRGSck/N0vCPAklnSro3baVfKqlD0t6Slqct8U5Jd0k6CrgY2C9d9uLaOBHxdEQsSW8/D/wCaKzpAKxZs47Ro3fZPD9q1AjWrMl11aWtYmvks+ceuzDxj/fivgdW9De9hpT1nhxnYDV7u4HWXTdZ2qFl3tAOUEkHAu8HDouIjZK6gNMi4jpJXwIuBR4EHoiIH0taCeybtvDrxX0TcCBwXyN5DQY7bL8t377sk3z6/36TF158udnpWJvwdtM/ZV6cYmtpNMVjgIOBRZKWAEcC+8Dmyx6NBD4KnJM3oKSdgRuBv42IF/tYZpqkRZIWdXVd32ucUaNGsHr1s5vn16xZx6hRI/KmUboy8xkypJNvX/ZJrv/enXz/1ub9vSvrPTnOwGiV7QZab93k1aH8U9NybPB5Aq6q6T/fLyIuApC0I7Ab0AnsmCtYsjP1JuDqiJjT13IR0RURkyJi0rRpH+x1mQkTxvHEE0/z5JOrWb9+A3PnLuSoow4p+PbKU2Y+sy6ZxmMrnuarV8wrOctiynpPjjMwWmW7gdZbN3m1QzFvdJz5j4DvSvpKRDwraQSwQ0SsBC4BrgbWAJcB7wNeAHbqLZCSPQbXAEsi4qsN5rPZkCGdzJgxnbPPvoBNm17h5JOPYdy4sYXjnDdxPBOHD2PY0CHMnjyJa5avZN6qtU3L57CD9+NDJ7+Lhx5dyd23fB6AC/71em5bsKRQnDLeV1nvyXHqK+OzKmu7KSufVlo3RXQotlrssigiX5KSLgRejIgvpvNnkHSjdJCMSJkO7AxcBBwREZskzQFuiIhrJc0G9gfmRsS5NXHfDSwAlgLdyXwmIm6rn9Hj/V67k+c9098QACyYMrKUONvteUG/Yxw6q18jOzcr6z1ZfWVsg3dP/1oJmbTetlPe9/Od/W4vn3j7z3LXm1uOO7wp7fPcLfOIuLDH/HXAdb0sOr9mmZNqbp/aR9w7SLptzMxaUhvs//Th/GZmWYZ0tH43i4u5mVkGt8zNzCqgDa5N4WJuZpZFbTCaxcXczCyDW+ZmZhXgPnMzswrwaBYzswpwy9zMrALcZ25mVgHtcG4WF3MzswxumZuZVcAQt8zNzNpfO7TM22EnrZlZU5V9cQpJJ0h6TNIKSef28viekhZIekDSUklTMnMs/rbMzAaXjgJTFkmdwNeBE4EDgNMlHdBjsfOA2RHxVuA0kusq1+VuFjOzDCWPZjkEWBERvwSQ9B3gvcCymmWC5GI/AMOAp7OCupibmWUouc98d+DJmvlVwNt7LHMhcLukvwV2AI7JCupuFjOzDEOUf5I0TdKimmlaAy95OnBNROwBTAGulVS3XrtlbmaWocgpcCOiC+iqs8hTwJia+T3S+2qdBZyQxvu5pNcBuwB9XrXaLXMzswwlj2a5DxgnaW9JQ0l2cM7pscxK4GgASfsDrwPqXuHaLXMzswxltnojYqOkjwG3AZ3AVRHxiKSZwKKImAN8Crhc0idJdoZ+JCLq/jxwMTczy1D2uVkiYh4wr8d9M2puLwPeWSSmi7mZWYZ2OALUxdzMLMM2LuZmZu3Pp8A1M6sAd7OYmVWAi7mZWQV0upibmbW/IR3uMzcza3vuZjEzq4DOZieQgzKOEG1Zk+fd2e/EF0wZWUYqTJ5X95QJZn0qYxssa/trte9DWfnA+H63q2c9envuejN9/+Oa0o53y9zMLIPHmZuZVYBHs5iZVcCQNjhZuIu5mVkGj2YxM6uATveZm5m1vzboZXExNzPL4m4WM7MK2MaH85uZtT+3zM3MKsDF3MysAlzMzcwqwEeAmplVgM/NYmZWAUPcMjcza3/uZjEzqwB3s5iZVYBHs5iZVYCLuZlZBfhEW2ZmFeCWuZlZBcjF3Mys/bmbxcysAuShiWZm7a8Nelna4teDmVlTdSj/lIekEyQ9JmmFpHPrLHeypJA0KSumW+ZmZhnKHM0iqRP4OnAssAq4T9KciFjWY7mdgI8D9+TKsbwUzcyqSQWmHA4BVkTELyNiPfAd4L29LHcR8AXg93mCupibmWWQ8k857A48WTO/Kr2v5vV0EDAmIubmzdHF3MwsQ5GWuaRpkhbVTNMKvZbUAfwb8Kkiz3OfuZlZhiJd5hHRBXTVWeQpYEzN/B7pfd12Ag4E7lDS1B8NzJF0UkQs6ito5Yr5ORP25dA3voHfrt/AX/x0Sb9iLVy4mM997nJeeeUVTjnlWKZNO6Vp+ThOe+RSZhxorW2wjFxaMZ88Sj6f+X3AOEl7kxTx04Azuh+MiOeAXbrnJd0B/EO9Qg4Z3SySRkhakk6rJT1VMz80b+aShkua3sdjO0i6N425TNKMvHF7c+uqtXzmvmXZC2bYtGkTM2fO4oorLmTu3K9z880LWbFiZdPycZz2yKXMOK20DZaVSyvmk4cUuacsEbER+BhwG/AoMDsiHpE0U9JJjeZYt5hHxLqImBgRE4FZwJe759O9sHkNB3ot5sDLwOT0Nd4CnJRnTGVflv7meZ7fsLHRp78aZ+lyxo7dlTFjRjN06DZMnfou5s/PNUJo6+TjOG2RS6lxWmgbLCuXVswnj5JHsxAR8yJifETsExGfS++bERFzeln23VmtcujHDlBJZ9a0qC+V1CFpb0nL05Z4p6S7JB0FXAzsly57cY9EX4mIl9LZocA2QNOPnV2zZh2jR2/+pcOoUSNYs2ZdEzOywaaVtsFWyqUZ+ZQ8mmWraKjPXNKBwPuBwyJio6Qu4LSIuE7Sl4BLgQeBByLix5JWAvumre/e4g0F7gX2Bb4SEYsbycvMbGtoh2F/jeZ4DHAwsEjSEuBIYB+AiJgFjAQ+CpyTJ1hErE8L/RjgnZL272252iE/T9/y/QZTz2fUqBGsXv3s5vk1a9YxatSIrfqaZrVaaRtspVyakU87tMwbLeYCrqrpP98vIi4CkLQjsBvQCexYJGhE/AZYCBzfx+NdETEpIibtdmJvB0yVZ8KEcTzxxNM8+eRq1q/fwNy5CznqqEO26mua1WqlbbCVcmlGPmWfm2VraHRo4o+A70r6SkQ8K2kEsENErAQuAa4G1gCXAe8DXiAZO/kakt4I/CEinpO0PUmrf2aDeXHexPFMHD6MYUOHMHvyJK5ZvpJ5q9YWjjNkSCczZkzn7LMvYNOmVzj55GMYN25s0/JxnPbIpcw4rbQNlpVLK+aTRzucNVER+fY1SroQeDEivpjOn0HSjdIBbCAZrbIzyfkEjoiITZLmADdExLWSZgP7A3Mj4tyauBOBa0jWVyfw7e69u/VMnndnv3eSLpgysr8hAJg875lS4tjgU8Y2WNb212rfh7LygfH9rsVP/+4HuevNbtu/pym1P3fLPCIu7DF/HXBdL4vOr1nmpJrbp/YRdwnQ645RM7NW0A4t88odAWpmVjZfacjMrAKauWMzLxdzM7MMbVDLXczNzLK0w0FDLuZmZhmaeTBQXi7mZmaZWr+au5ibmWWQi7mZWfuTOpudQiYXczOzDG6Zm5lVgou5mVnbk1p/cKKLuZlZJrfMzczanvvMzcwqQHg0i5lZ21MbHALqYm5mlsnF3Mys7bnP3MysEjw00cys7XV4nLmZWRW4m8XMrO3J3SxmZlXglrmZWdvzOHMzs0pwMTcza3s+nN/MrAJ80JCZWQW4z9zMrBJaf2hi62doZtZkKvAvVzzpBEmPSVoh6dxeHt9W0vXp4/dI2isrpou5mVkGqSP3lB1LncDXgROBA4DTJR3QY7GzgN9ExL7Al4EvZMV1MTczy9RRYMp0CLAiIn4ZEeuB7wDv7bHMe4H/SG9/FzhaGR33LuZmZhlK7mbZHXiyZn5Vel+vy0TERuA5YETdqBFRyQmY5jiDK04r5eI47fWZlzkB04BFNdO0Ho9/ALiiZv7Pga/1WOZhYI+a+f8Gdqn3ulVumU9znEEXp5VycZyBiVNWLqWJiK6ImFQzdfVY5ClgTM38Hul9vS4jaQgwDFhX73WrXMzNzFrRfcA4SXtLGgqcBszpscwc4Mz09geAH0faRO+Lx5mbmQ2giNgo6WPAbUAncFVEPCJpJrAoIuYAVwLXSloB/Jqk4NdV5WLe86eN41Q/Tivl4jgDE6esXAZURMwD5vW4b0bN7d8DpxSJqYyWu5mZtQH3mZuZVYCLuZlZBVS5z7wlSNoO2DMiHmt2LmWQdAgQEXFfegjyCcAv0j7AvDHGA58GxlKzDUbEUWXnazZYVK5lLmmspGPS29tJ2qmBGKMkXSnplnT+AElnNRDnPcAS4NZ0fqKknkOQBiSX9Lk3SZqqPCeQ6P35FwBfBb4h6fPA14AdgHMl/XOBUDcA9wPnkRT17qmRnPaRtG16+92S/k7S6xuMNUrSn6TTGxuM8XFJOytxpaT7JR3XYC79+twlndK9/Us6L/38D2ogl20lnSHpnyTN6J4aiLOHpO9JekbSWkk3StqjgTilfScqpdlHS5V85NX/IRnD+d/p/DhgfgNxbgFOBR5M54cADzUQZzHJYP8Hau4rFKesXNLnHgN8i+RosouB/Qo+/yGSoVTbA88DO6f3bwcsLbJeSvzMl6TrZF/gceASYF4DcU4F/ofkfBjfBH4FfKCBON2f0/HATcAfA/c3Yxvs/kyAw4E7gKnAPQ3kcitwPXAO8KnuqYE4PwQ+mr6XIcBHgB82Y91UcWp6AqW+meSLPbQ/xTN9zn3p/7VxljQQ5+5e4uQuemXm0iPmMGA6ybkf7kq/YNvkeN4Dvd0umhNwIfDXwK7A8O6pwfdyf/r/p4G/7S23nHEeBN5YMz+yu1gUjNNdQL8CvL8f+fT7c+9+LvB54Ix+5PJwf7a3evk3+L0q/TtRhalq3Sx/iOQsZMDmw2AbGXv5kqQR3c+VdCjJiW6KekTSGUCnpHGS/p2keDYjF9LnjyAp3mcDD5AUnYNIWk1Z1kvaPr39tpqYw4BXCqRxJknxvYvk18tiknNYNGKDpNPTmDen923TQJyOiFhbM7+OxrohF0u6HZgC3JZ2cxRZN93K+NyfknQZ8EFgXtod1ch7ukvShAae19M6SX8mqTOd/oyMQ9T7UOp3ojKa/dekzAn4V+CfgF8AxwLfAz7XQJyDgDtJNpA7SX6+v7mBONsDnyPp+lmU3n5dM3JJY30PWAb8IzC6x2OLcjx/2z7u3wWY0KTP/ACSfvzT0/m9gc80uO3cRvLT/yMkP+W/0ECcjvQze306P6LBbaffn3u6/f0pMC6d3xU4roFclgEbgMeApSTdbYV+YaZxxpIcpv4MsBb4L5LBAQO+bqo4VeqgoXTH3lnAcYCA2yLi8gZiHArcC+yXxnksIjaUnG6RnIaUkYukE0n6cN9J0lr8GfCNSI42G1CSDgP2YsvRLN9sMFa/RwxJ+gJwD0n/MsBPgUMj4jM5n193x2JE3N9ATg197pKGZ+Ty64J5jAXeAByR3rUQ+G1E/E+ROGVoxe9nq6haMX9bRCzucd+fRMTNfT2njzgPRMRbS8hnPPAPvLZoFRqCV1bhkzSbZMflt9K7ziBpQRY6bLi/JF0L7EOyj2NTendExN81EOs9wBeBoRGxt6SJwMyIOKlgnPsj4qAe9y2NiDfnfP6COg9H3s9c0p/WezwibsoR41ckXRC9nVw7IuJNeXKpifdxkm65m9KY7wMuj4h/LxhnJMkghb3Yclv+i4JxSvl+Vk3Vivn9wIcj4uF0/nTgExHx9oJxvgj8HLgp+rGCJD0IzCLpE+4uWvT8g5MRo8zCtywiDsi6b2uT9ChwQH/WbU2sxcBRwB3dX3BJD0fEgTmf/1ckO2PfRDLKp9tOwJ0R8Wf9zbEISVfXeTiKFr4ySFoKvCMiXkrndwB+nvcPXU2cu0h+8fT8PtxYME4p38+qqdpBQx8AvpvudDwC+DBJl0tRfwn8PbBJ0sskrZGIiJ0LxtkYEd9o4PVrTaKkwgfcL+nQiLgbQNLbaXzHY388DIwG/reEWBsi4jlteUWtIjscryPpH/88UHth3ReKdEeU0aJOl/to3tesk0vZXT6ipvimt/NduXhL2+fttsrQ/f3cKOn3NP79rJRKFfOI+KWk00h2rKwk2dnzcgNxCh9oVKumz/IHkv6aZMfjH2riF+mz7Hfhk/QQyc/ubUhGJqxM58eS7CweEJJ+kL7uTsAySfey5Xop1DWS2mLEEPB3FBgxFBHPkexIO72B1671nnovQ9JFUYikqST7OF63OVDEzBxP/VJGLkWPtL0auEfS99L595GcorWomyVNiQJHC/cmInZKv2PjqFk3g10lullqilW3N5J8Qf8AUPTnYBrzJOBd6ewdRfrd++iz3Jxfnj7LHoVvIskOn4YKX7oDq08DtSNL0pEZefykgZjbA//Mq7/AbgM+24ydumWSNItkNMpk4AqSX533RkRTjnRMW/ubdw5HxAMFnvsCr27/O5JsxxvT+cItaklnAx8nuULPEpIdondFxNFF4lRNVYp5qcVK0sXAwby6o/B0kqF7/1gwzqnArRHxvKTzSYZUXZTnZ25a+AR8geTIu80PkQyZK7QfoNVIGkWyjiEpUmvrLV8nzkGNjBTZWtL39S/AbhFxopLz17wjIgq1ZLt3vtb8vyNwS0QckfnkV2NsT9IdsWdETEt/uexXdEBAWST9J8lImJ9GxKP9iPMQybZzd0RMlPRHwL9ERN2urqqrxEFDEfE/tRPwMklLoHsqagpwbERcFRFXkZxMamoDcc5LC/nhJD9trwBy9aFHxE8i4g6SIzN/UjPdQXL4fNtK/8jdS3Ly/VNJfsJ/oMFwX5L0qKSLJOXa6bmVXUPy62C3dP5x4BMNxOn+ZfE7SbuRtGR3LRjjamA9cFg6/xTw2QZyKcuVJF2GX5X0S0nfTUfKFPX77l9ekraNiF+QDFMc1CpRzLtJOknScpLzavwEeIJk51Yjak/WNKzBGN07jaaSDOWaS3K6gUyS/iptgewnaWnN9CuSAzfa2T8DB0fEmRHxYeAQ4PxGAkXEZJKuiGeAyyQ9JOm88lItbJeImE26EzYiNrLlzsO8fqDkhGGXkJyU7FckO2uL2Cci/pXkgB8i4nc0tuOyFBGxgOTAufOBy0la13/VQKhV6br5L+CHkr5Pcl6dQa1SO0CBi0j6z34UEW+VNBloZGjZ54EH0rHDIuk7P7f+U3rVfTj1scAXVOxw6lJGWbSosg6dByAiVpO09haQdEnNoHkt0LIONf8FsCkibky7ag4iKV5FrE8PqOrOZR9q9rsMNEnzSc6y+XOSIYoHN9K9FhHvT29emH7mw0jPTDqYVaLPvJukRRExKR3f/daIeEXSgxHxlgZi7cqWfbqrG4ixPUkXzUMRsTyNOSEibi8aq0okXQK8Gfh2etdpJIeHn9P3s/qMtT/JuUdOJrpdYZwAAARZSURBVPmjcD1wY6N98P2V7ij8d+BAkpFII0nOvljo11RNX/nhJI2ULwIziuwrkXQsyWmGDwBuJzny9yNpV92Ak/RlknP6/IHkMPyFJOPVC484s9eqWjH/Ecmwqc+TnC9kLclf/8PqPrH3WLvz2osnLCwp1UEvHZf9znT2pxFRtNXZHefnJAV8dkQ8XVZ+jZJ0Ckmf+RiSPzBvB84vupO2+yhHJeeNfygirmvkyMf0V8KhJL8w746IZ4s8f2tQcvKxj5AcHT06IrZtbkbVULVivgPJjiMBHyL5+fWtiCh0ZjYl5+n4IPAIrx6AEg2Og7aUpJ9FxOE1Q9Vq+29fAX4NXBIRlzYlwRKU0aJO49xMssPyWJIulpdJfiEW+pXZSo0SSR8jOZjvbST7s35K8of8x83Ip2oqVczLIukxkrOwNa1/cTBKW5F3RUTmyARJsyPi1F6OMeg+GrDwsQVlKLFF3e8uulZrlEj6B9LD+dMdw1aiShTzHgclbPEQjR2UcAtwSkS8WEZ+lp+kXSMi82jX7uX6OsZgoA6E6qmsFnVJubhRMohUopiXRcnFIwLYHXgLMJ8tj7osfHIrG1xaaae3GyWDi4t5DUln1ns8Iv5joHKx+sr+NVYlbpQMTi7mGSS9ARhTdGiZWbO4UTI4uZj3QtIdwEkkIwAWkwxxvDMi/r6ZeZk1yo2S6qvU4fwlGhYRz5NcP/Gb6bCyY5qck1khku6QtHN6utj7gcsl/Vuz87Ktw8W8d0PSHVen8uoV383ajRslg4iLee9mkhzFtyIi7pP0JmB5k3MyK8qNkkHEfeZmFZWeWuB84GcR8ddpo+SSiDi5yanZVuBi3gtJrwPO4rWX7Brwi+mameVRtVPgluVaklOQHk/S5fIhoOEro5g1gxslg4v7zHu3b0ScD7yUjsmdSnL2O7N2ci3JlX2OJ7lYyx7AC03NyLYaF/PebUj//216KbJhJBeJNmsnbpQMIu5m6V1XepDF+cAckiuKz2huSmaF9WyUrMaNksryDlCzipJ0NnAjyVWdriZtlETErKYmZluFi3kvJI0C/gXYLSJOTK/B+I6IuLLJqZmZ9cp95r27huSgod3S+ceBTzQtG7MGSBol6cr0VLhIOkDSWc3Oy7YOF/Pe7RIRs0mvzpJeFWVTc1MyK+wa3CgZNFzMe/dSegmzAJB0KPBcc1MyK8yNkkHEo1l69/cko1j2kXQnMBL4QHNTMivMjZJBxMW8d/sAJwJjgJNJxuZ6XVm7caNkEHE3S+/OT08d+gZgMnAp8I3mpmRWWHej5DCSvvPluFFSWS7mvevuV5wKXB4Rc4GhTczHrBFulAwiLua9e0rSZcAHgXmStsXrytqPGyWDiA8a6oWk7YETgIciYnl6gv8JEXF7k1Mzy03SzcBTwLHAQcDLwL0R8ZamJmZbhYu5WUW5UTK4uJibmVWA+4HNzCrAxdzMrAJczM3MKsDF3MysAlzMzcwq4P8D1h0ueQ+2LscAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "tf_vectors = []\n",
    "for i, words_list in enumerate(words_lists, 1):\n",
    "    tf_vector = np.array([0] * len(vocabulary))\n",
    "    for word in words_list:\n",
    "        word_index = vocabulary[word]\n",
    "        # Update the count of each word using its vocabulary index.\n",
    "        tf_vector[word_index] += 1\n",
    "        \n",
    "    tf_vectors.append(tf_vector)\n",
    "    \n",
    "    \n",
    "sns.heatmap(tf_vectors,  cmap='YlGnBu', annot=True, \n",
    "            xticklabels=vocabulary.keys(),\n",
    "yticklabels=['Text 1', 'Text 2', 'Text 3'])\n",
    "plt.yticks(rotation=0)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets compute TF vector similarity between `text1` and the other 2 texts. We'll also printthe original binary vector similarity, for comparison. Based on our observations, the similarity between `text1` and `text2` should shift, while the similarity between `text1` and `text3` should remain the same.\n",
    "\n",
    "**Listing 13. 22. Comparing metrics of vector similarity**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The recomputed Tanimoto similarity between texts 1 and 2 is 0.4615.\n",
      "Previously, that similarity equaled 0.4444 \n",
      "\n",
      "The recomputed Tanimoto similarity between texts 1 and 3 is 0.4167.\n",
      "Previously, that similarity equaled 0.4167 \n",
      "\n"
     ]
    }
   ],
   "source": [
    "tf_vector1 = tf_vectors[0]\n",
    "binary_vector1 = vectors[0]\n",
    "\n",
    "for i, tf_vector in enumerate(tf_vectors[1:], 2):\n",
    "    similarity = tanimoto_similarity(tf_vector1, tf_vector)\n",
    "    old_similarity = tanimoto_similarity(binary_vector1, vectors[i - 1])\n",
    "    print(f\"The recomputed Tanimoto similarity between texts 1 and {i} is\"\n",
    "          f\" {similarity:.4f}.\")\n",
    "    print(f\"Previously, that similarity equaled {old_similarity:.4f} \" \"\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "TF vectors yield improved comparisons, because they're sensitive to count-differences between texts. This sensitively is useful. However, it can also be detrimental when comparing texts of different lengths\n",
    "\n",
    "### 13.2.1. Using Normalization to Improve TF Vector Similarity\n",
    "\n",
    "Imagine you are testing a very simple search engine. The search engine takes a query as input. It then compares the query to document-titles within a database. Suppose you run a query for \"Pepperoni Pizza\". The following 2 titles are returned: \n",
    "\n",
    "* Title A: \"Pepperoni Pizza! Pepperoni Pizza! Pepperoni Pizza!\"\n",
    "* Title B:  \"Pepperoni\"\n",
    "\n",
    "Lets check if Title A ranks higher than Title B, relative to query. We'll start by constructing TF vectors from a 2-element vocabulary `{pepperoni: 0, pizza: 1}`.\n",
    "\n",
    "**Listing 13. 23. Simple search engine vectorization**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "query_vector = np.array([1, 1])\n",
    "title_a_vector = np.array([3, 3])\n",
    "title_b_vector = np.array([1, 0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll now compare the query to the titles, and sort the titles based on the Tanimoto similarity.\n",
    "\n",
    "**Listing 13. 24. Ranking titles by query similarity**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "'B: Pepperoni' has a query similarity of 0.5000\n",
      "'A: Pepperoni Pizza! Pepperoni Pizza! Pepperoni Pizza!' has a query similarity of 0.4286\n"
     ]
    }
   ],
   "source": [
    "titles = [\"A: Pepperoni Pizza! Pepperoni Pizza! Pepperoni Pizza!\", \n",
    "          \"B: Pepperoni\"]\n",
    "title_vectors = [title_a_vector, title_b_vector]\n",
    "similarities = [tanimoto_similarity(query_vector, title_vector)\n",
    "                for title_vector in title_vectors]\n",
    "\n",
    "for index in sorted(range(len(titles)), key=lambda i: similarities[i], \n",
    "                    reverse=True):\n",
    "    title = titles[index]\n",
    "    similarity = similarities[index]\n",
    "    print(f\"'{title}' has a query similarity of {similarity:.4f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Unfortunately, Title A outranks Title B. This discrepancy in rankings is caused by text-size. Title A has 3x as many words as the query. We need to subdue the influence of text-size on ranked results. One naïve approach is to just divide `title_a_vector` by 3.\n",
    "\n",
    "**Listing 13. 25. Eliminating size differences through division**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "assert np.array_equal(query_vector, title_a_vector / 3)\n",
    "assert tanimoto_similarity(query_vector, \n",
    "                           title_a_vector / 3) == 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using simple division, we can manipulate `title_a_vector` to equal `query_vector`. Such manipulation is not possible for `title_b_vector`. Why is this the case? To illustrate the answer, we'll need to plot all 3 vectors in 2D space. We'll visualize the vectors as line segments that stretch from the origin.\n",
    "\n",
    "**Listing 13. 26. Plotting TF Vectors in 2D Space**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEGCAYAAABo25JHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO3deZzN9f7A8debCEVdzC03ZbRYBjODIUsa2W+LuLiZ22KpEOpqV35KpS5Ryh6GQULNTZYWstM+oxGR7SYJNYixm+X9++N7jDFmmGHOfM/yfj4e59H5Lud83985Ou/z+Xy+3/dHVBVjjDHBq4jbARhjjHGXJQJjjAlylgiMMSbIWSIwxpggZ4nAGGOC3CVuB5Bf5cuX19DQULfDMMYYv5KYmLhXVUNy2uZ3iSA0NJSEhAS3wzDGGL8iIr/kts26howxJshZIjDGmCBnicAYY4Kc340R5CQ1NZWdO3dy/Phxt0Mx2ZQoUYKKFStSrFgxt0MxxuQiIBLBzp07KV26NKGhoYiI2+EYD1Vl37597Ny5k8qVK7sdjjEmF17rGhKREiLyrYisFZEfReSlHPa5VERmi8hWEflGREIv5FjHjx+nXLlylgR8jIhQrlw5a6kZ4+O8OUZwAmimqhFAJNBGRBpk2+dB4E9VvREYAQy90INZEvBN9rkY4/u8lgjUcdizWMzzyF7z+m5gqud5PNBc7JvDGGPOkJoKmzd77/29etWQiBQVkSTgD+BzVf0m2y7XAL8CqGoacBAol8P79BCRBBFJSE5O9mbIF2znzp3cfffd3HTTTVx//fX07duXEydOFNrxp06dSkxMzBnr9u7dS0hISL7j2L59O++9915BhmeMuUDffw/168Ntt8GRI945hlcTgaqmq2okUBGoLyI1L/B9JqhqlKpGhYTkeIe0q1SVf/zjH7Rr144tW7awZcsWjh07xjPPPFMg75+enn7efdq3b8/nn3/O0aNHM9fFx8dz1113cemll+breBeSCNLS0vK1vzHm3I4fh+eeg3r1YPduGDUKLrvMO8cqlPsIVPUAsAxok23Tb8C1ACJyCXAFsK8wYipIS5cupUSJEnTr1g2AokWLMmLECKZNm8bhw4eJi4ujb9++mfvfeeedLF++HIBFixbRsGFD6tSpQ6dOnTh82OlNCw0N5dlnn6VOnToMGTKEOnXqZL5+y5YtZywDlClThujoaObPn5+5btasWZmthMTERKKjo6lbty6tW7dm9+7dAGzdupUWLVoQERFBnTp12LZtG/3792fVqlVERkYyYsQIjh8/Trdu3ahVqxa1a9dm2bJlAMTFxdG2bVuaNWtG8+bNC/ivakxwa9cOhgyBBx6AjRvhH//w3rG8edVQiIhc6XleEmgJ/JRtt3lAF8/zjsBSvci5M0XEa4/c/Pjjj9StW/eMdWXKlCE0NJStW7fm+rq9e/cyePBgFi9ezJo1a4iKiuLNN9/M3F6uXDnWrFnDgAEDuOKKK0hKSgJgypQpmUknq5iYGGbNmgXArl272Lx5M82aNSM1NZVHH32U+Ph4EhMT6d69OwMGDADg3nvvpU+fPqxdu5Yvv/ySChUqMGTIEJo0aUJSUhKPP/44Y8aMQURYt24dM2fOpEuXLplXAq1Zs4b4+HhWrFiRx0/IGJObQ4eclgBA//6waBFMngx/+Yt3j+vN+wgqAFNFpChOwnlfVReIyMtAgqrOA2KB6SKyFdgPdPZiPD7n66+/ZsOGDTRu3BiAkydP0rBhw8zt99xzT+bzhx56iClTpvDmm28ye/Zsvv3227Pe74477qB3796kpKTw/vvv06FDB4oWLcrGjRtZv349LVu2BJyupgoVKnDo0CF+++032rdvDzg3f+Vk9erVPProowBUq1aNSpUqsdkzctWyZUvKli1bAH8NY4LbwoXQowfcdx+8+io0bVp4x/ZaIlDVH4DaOax/Icvz40Anb8VQWMLCwoiPjz9jXUpKCnv27KFq1aqsX7+ejIyMzG2nfk2rKi1btmTmzJk5vu9lWToEO3TowEsvvUSzZs2oW7cu5cqdNaZOyZIladOmDXPmzGHWrFmZrQtVpUaNGnz11Vdn7H/o0KELO+FcYjTG5N/+/fDEEzB1KlSrBnfcUfgxBFytIVX12iM3zZs35+jRo0ybNg1wfnE/+eST9O3bl5IlSxIaGkpSUhIZGRn8+uuvmb/mGzRowBdffJHZfXTkyJHMX9rZlShRgtatW/PII4/k2C10SkxMDG+++Sa///57ZuuiatWqJCcnZyaC1NRUfvzxR0qXLk3FihX56KOPADhx4gRHjx6ldOnSZySJJk2aMGPGDAA2b97Mjh07qFq1ap4+D2NM7pYsgbAwmDEDBgxwrhBq1Kjw4wi4ROAGEWHOnDnEx8dz0003Ua5cOYoUKZLZD9+4cWMqV65MWFgYjz32WOZAb0hICHFxccTExBAeHk7Dhg356afswyin3XvvvRQpUoRWrVrluk/Lli3ZtWsX99xzT+a4RvHixYmPj+fZZ58lIiKCyMhIvvzySwCmT5/OyJEjCQ8Pp1GjRuzZs4fw8HCKFi1KREQEI0aMoHfv3mRkZFCrVi3uuece4uLi8n0lkjHmbH/9K1SuDN99B4MHQy69s14nFzk2W+iioqI0+8Q0GzdupHr16i5FdLYvv/ySmJgY5syZc9bVPRdj+PDhHDx4kFdeeaXA3rMw+NrnY4xbVJ0uoDVrYOTI0+sK4zZaEUlU1aictgVE0Tlf06hRI375JdfJgC5I+/bt2bZtG0uXLi3Q9zXGFI6ff4aePeHzz6FJEzh2DEqWLJwkcD6WCPzEnDlz3A7BGHMB0tNhzBjn5rAiRWDsWCchFPGhjnlLBMYY40V798ILL0B0NIwfD9dd53ZEZ/OhnGSMMYEhNRXi4iAjA666yhkT+Phj30wCYInAGGMKVGIiREVBt27OeADA9df7xlhAbiwRGGNMATh2zCkLcfPNkJwMc+ZA69ZuR5U3lggKwL59+4iMjCQyMpKrr76aa665JnO5kefukOwVPZcvX86dd96Z72O99dZblChRgoMHD+a4/frrr2fTpk1nrOvXrx9Dh+Z/zp+4uDh27dqV79cZE4zatYOhQ52WwIYNzrK/sERQAMqVK0dSUhJJSUn06tWLxx9/PHP51I1bBVXjf+bMmdSrV48PP/wwx+2dO3fOLDwHkJGRQXx8PJ0757+M04UkgryUzDYmUKSknC4S9/zzsHgxTJwIV17pblz5ZYnAyy6//HKAs0o7Z3XkyBG6d+9O/fr1qV27NnPnzs3xvbZt28bhw4cZPHhwrvWJYmJimD17dubyypUrqVSpEpUqVSI9PZ2nn36aevXqER4ezjvvvJO539ChQ6lVqxYRERH079+f+Ph4EhISuPfee4mMjOTYsWMsWbKE2rVrU6tWLbp375454U3WktkffPDBRf29jPEXn3wCNWvCyy87y9HR4K/V2APy8tGcqvb985/QuzccPQq333729q5dncfevdCx45nbPFMHXJQhQ4YwfPhwFixY4HnP02/66quv0qxZMyZPnsyBAweoX78+LVq0OKug26xZs+jcuTNNmjRh06ZN/P7771x11VVn7FOrVi2KFCnC2rVriYiIOGNOgtjYWK644gq+++47Tpw4QePGjWnVqhU//fQTc+fO5ZtvvqFUqVLs37+fsmXLMnr0aIYPH05UVBTHjx+na9euLFmyhCpVqvDAAw8wbtw4+vXrB5wumW1MoNu7Fx5/HN5916kT1Lat2xFdPGsR+IBFixYxZMgQIiMjadq0KcePH2fHjh1n7Tdz5kw6d+5MkSJF6NChQ66/vk/NS5CWlsZHH31Ep06dMo8zbdo0IiMjufnmm9m3bx9btmxh8eLFdOvWjVKlSgHkWFZ606ZNVK5cmSpVqgDQpUsXVq5cmbk9a8lsYwLV5587X/6zZjn3BqxZAw0auB3VxQvIFsG5fsGXKnXu7eXLF0wLID9Ulf/+97/nrOi5bt06tmzZkjmnwMmTJ6lcufIZM5+d0rlzZ1q1akV0dDTh4eGZrQZVZdSoUbTOdinDwoULL/ocrBy1CQYVKkCVKjBuHNSq5XY0BcdaBIUke2nnrFq3bs2oUaMyS11///33Z+0zc+ZMBg0axPbt29m+fTu7du1i165dOdY0uuGGGyhfvjz9+/c/Y0L71q1bM27cOFJTUwGnpPSRI0do2bIlU6ZMyZzveP/+/WfFXLVqVbZv355ZMnv69OlER0df6J/DGL+gCpMmQZ8+znLNmrBqVWAlAbBEUGiyl3bOauDAgaSmphIeHk6NGjUYOHDgWa+fNWtW5kxip7Rv3/6MK4SyiomJ4aeffuIfWSY6feihhwgLC6NOnTrUrFmTnj17kpaWRps2bWjbti1RUVFERkYyfPhwALp27UqvXr2IjIxEVZkyZQqdOnXKHIfo1avXxf5ZjPFZ//sftGgBDz/sXA567Jiz3pdvDLtQVobaeJ19PsafpKc7JaIHDIBLLoHhw+Ghh3yrSNyFsDLUxhiTR3v3wksvOZeCjhsHFSu6HZH3+XmOM8aYi3fyJEyefLpIXFISzJsXHEkALBEYY4Lcd99B3brw4IPOncEAoaGBORaQG0sExpigdPQoPPWUcx/An386LYBzTAce0GyMwBgTlO6+22kB9OgBr78OV1zhdkTusRaBMSZoHDx4ukjcwIGwdCm8805wJwHwYiIQkWtFZJmIbBCRH0Xk3zns01REDopIkufxgrfi8abCKEO9fft2SpYsSWRkJBERETRq1OisctNgZaiNyc2CBVCjhnNFEMCtt8Jtt7kbk6/wZosgDXhSVcOABkAfEQnLYb9VqhrpebzsxXi8prDKUN9www0kJSWxdu1aunTpwmuvvXbWPlaG2pgzJSfDv/4Fd90FZctClnssjYfXEoGq7lbVNZ7nh4CNwDXeOp6vKsgy1FmlpKTwl7/85az1VobamNMWLXKKxMXHOy2BhASoV8/tqHxPoQwWi0goUBv4JofNDUVkLbALeEpVf8zh9T2AHgDX5WH2Z3kpf9d91alQh8QeiWe9Xl8suLuuC6IM9bZt24iMjOTQoUMcPXqUb745+89pZaiNOe2aa6B6defGsBo13I7Gd3l9sFhELgf+C/RT1ZRsm9cAlVQ1AhgFfJTTe6jqBFWNUtWokJAQ7wbsgryWoT7VNbRt2zbeeustevTokeP7WRlqE6wyMmDCBHjkEWe5Rg1YudKSwPl4tUUgIsVwksAMVT1rbsWsiUFVPxGRsSJSXlX3XsxxL/aXfEG2BPJ0vDyUoc6ubdu2dOvWLcdtVobaBKOtW50CccuXO4PAx45ByZJuR+UfvHnVkACxwEZVfTOXfa727IeI1PfEs89bMbnpYstQZ7d69WpuuOGGHLdZGWoTTNLT4Y03IDzcmShm4kRYssSSQH54s0XQGLgfWCciSZ51zwPXAajqeKAj8IiIpAHHgM7qb+VQ8yhrGequXbtSu3btzG0DBw6kX79+hIeHk5GRQeXKlTPHErI6NUagqhQvXpxJkybleryYmBj69+9/Vhnq7du3U6dOHVSVkJAQPvroI9q0aUNSUhJRUVEUL16c22+/nddeey2zDHXJkiX56quvMstQp6WlUa9ePStDbXzC3r0weDC0bAljxzrjAiZ/rAy18Tr7fExBO3ECpk1z6gMVKQK//ALXXRdc9YHy61xlqO3OYmOMX/nmG6dIXI8ep4vEVapkSeBiWCIwxviFI0fgiSegYUOnVMTHHwdvkbiCFjBF51QVsZ8EPsffuh6N72rXzmkBPPIIDBkCZcq4HVHgCIgWQYkSJdi3b5996fgYVWXfvn2UKFHC7VCMnzpw4PRcwS+8ACtWOAPClgQKVkC0CCpWrMjOnTtJTk52OxSTTYkSJagYLNM8mQI1b57z6//++50WQJMmbkcUuAIiERQrVozKlSu7HYYxpgD88Qc89hjMnu3cG9Cxo9sRBb6A6BoyxgSGzz5zagPNmQOvvOIUiYvK8YJHU5ACokVgjAkM114LtWo54wBhORWtN15hLQJjjGsyMpzKoD17Oss1aji1giwJFC5LBMYYV2zeDE2bQu/e8PPPp6eQNIXPEoExplClpcHQoc5A8Lp1MGUKLFwIdpWxe2yMwBhTqPbtcxLB7bfDmDFQoYLbERlrERhjvO7ECXjnHWdM4KqrYO1a+PBDSwK+whKBMcarvvoKateGXr1g6VJn3bXXuhuTOZMlAmOMVxw+DP36QePGTsG4zz6DFi3cjsrkxMYIjDFe0a6dM1NY377w2mtQurTbEZncWIvAGFNg/vzzdJG4QYNg1SoYNcqSgK+zRGCMKRAffujcCDZokLN8yy3Ow/g+SwTGmIuyZ49TGK5DB7j6aujc2e2ITH5ZIjDGXLBPP3VaAQsWOOMA337rXCFk/IsNFhtjLlilSs4X/5gxUK2a29GYC2UtAmNMnmVkwOjR8PDDznJYmHNlkCUB/2aJwBiTJ5s2wa23wqOPwq+/WpG4QGKJwBhzTqmp8J//QEQEbNgAcXHO2IAViQscXksEInKtiCwTkQ0i8qOI/DuHfURERorIVhH5QUTqeCseY8yF+fNPGDYM7rrLSQRduoCI21GZguTNFkEa8KSqhgENgD4ikn26ib8DN3kePYBxXozHGJNHx487s4RlZMBf/wo//AAffOBcHmoCj9cSgaruVtU1nueHgI3ANdl2uxuYpo6vgStFxOoRGuOi1audbqA+fU4XiatY0d2YjHcVyhiBiIQCtYFvsm26Bvg1y/JOzk4WiEgPEUkQkYTk5GRvhWlMUDt0yKkL1KQJnDwJixZZkbhg4fVEICKXA/8F+qlqyoW8h6pOUNUoVY0KCQkp2ACNMYBTJG7sWPj3v52Zw1q2dDsiU1i8ekOZiBTDSQIzVPXDHHb5DchambyiZ50xphDs3+9c/VOqFLzyijMI3LCh21GZwubNq4YEiAU2quqbuew2D3jAc/VQA+Cgqu72VkzGmNPi46F69dNF4ho1siQQrLzZImgM3A+sE5Ekz7rngesAVHU88AlwO7AVOAp082I8xhhg925nIHjOHKhbF+691+2IjNu8lghUdTVwzquNVVWBPt6KwRhzpo8/hvvucy4PHToUnngCLrGKY0HP/gkYE0Suvx7q1XPqBVWp4nY0xldYiQljAlh6Orz9Njz4oLNcvbpzWaglAZOVJQJjAtSGDc49Af36OZPHWJE4kxtLBMYEmJMnYfBgZ56AzZvh3XediWOsSJzJjY0RGBNgDhyAESOgfXsYOdKpFWTMuViLwJgAcOyYMwB8qkjcunUwa5YlAZM3lgiM8XMrVzpF4h59FJYtc9b97W/uxmT8iyUCY/xUSgr07g3R0ZCWBosXQ/Pmbkdl/JGNERjjp9q1g+XL4fHHnTpBl13mdkTGX1kiMMaP7N3rFIgrVQpefdUpEteggdtRGX9nXUPG+AFVZ/C3enV48UVnXcOGlgRMwbBEYIyP++03pxsoJgYqV4YHHnA7IhNorGvIGB+2YIFTHTQ1FYYPd+4SLlrU7ahMoLFEYIwPu/FGZ56AUaOc58Z4g3UNGeND0tOdu4K7dnWWq1WDTz+1JGC8yxKBMT7ixx+hcWNnjoC9e61InCk8lgiMcdnJk/Dyy06RuG3b4L33YP58KxJnCo8lAmNcduCAUxyuUyendHRMjHN/gDGFJc+DxSJSEwgDMn+nqOo0bwRlTKA7ehQmToS+fU8XiatQwe2oTLDKUyIQkReBpjiJ4BPg78BqwBKBMfm0bBk89BD8739Qs6ZTH8iSgHFTXruGOgLNgT2q2g2IAK7wWlTGBKCDB6FnT2jWzOn6WbbMisQZ35DXrqFjqpohImkiUgb4A7jWi3EZE3DatXNKRj/9NAwa5NQLMsYX5DURJIjIlcBEIBE4DHzltaiMCRDJyU5V0FKl4D//ce4KrlfP7aiMOVOeuoZUtbeqHlDV8UBLoIuni8gYkwNV5zLQrEXiGjSwJGB8U54SgYgsEZHbAVR1u6r+ICITzvOaySLyh4isz2V7UxE5KCJJnscL+Q/fGN+zcye0bevUCLrxxtN3CRvjq/I6WFwZeNZz9dApUed5TRzQ5jz7rFLVSM/j5TzGYozPmjcPwsJg6VKnVMQXX0CNGm5HZcy55TURHMC5augqEZkvIue9YkhVVwL7LyY4Y/xNlSpwyy3OfQFWKdT4i7wmAlHVNFXtDfwX5x6CvxbA8RuKyFoR+VREcv3dJCI9RCRBRBKSk5ML4LDGFIy0NKc89Kk5AqpVg08+geuvdzcuY/Ijr4lg/KknqhoHdAUWXeSx1wCVVDUCGAV8lNuOqjpBVaNUNSokJOQiD2tMwfjhB2eWsKefdiaStyJxxl/lNRHUF5HIUwuqmgjsuJgDq2qKqh72PP8EKCYi5S/mPY0pDCdOOFcC1a0LO3bA++/DnDlWJM74r7wmgtbAVBHJOkle24s5sIhcLeKU1hKR+p5Y9l3MexpTGFJSYOxYpzjchg1OsTgrEmf8WV5vKPsDuA14V0RuBv4NnPOfvojMxKlPVF5EdgIvAsUAPPcjdAQeEZE04BjQWVX1Qk7CGG87cgQmTIDHHoOQEFi/Hq66yu2ojCkYeU0EoqoHgbtEZBCwnPPUGlLVmPNsHw2MzuPxjXHNkiXw8MPw888QEeHUCrIkYAJJXruG5p16oqqDgKHAdi/EY4zPOHDAqRLaogVccgmsWOEkAWMCTZ5aBKr6Yrbl+cB8r0RkjI9o3x5WrYJnn3UGh0uWdDsiY7zjnIlARFar6i0icgjI2n8vgKpqGa9GZ0wh+/13uPxyp1DckCFOS6BuXbejMsa7ztk1pKq3eP5bWlXLZHmUtiRgAokqTJ/ulIc4VSTu5pstCZjgcM5EICIlRKSfiIz23N2b56ktjfEXO3bAHXc4dwdXrQoPPuh2RMYUrvN9sU8FUoFVwO1ADZxLR40JCHPnwn33OS2CkSOhd2+rD2SCz/kSQZiq1gIQkVjgW++HZIz3qTo3gVWrBk2bwqhREBrqdlTGuON8l4+mnnqiqmlejsUYr0tLg6FD4f77neWqVWH+fEsCJridLxFEiEiK53EICD/1XERSCiNAYwrK2rXOAHD//nD0qBWJM+aUc3YNqar1lhq/d/w4DB7stATKlYP4eOjQwe2ojPEdeb2z2Bi/degQvPOOM3Xkhg2WBIzJzhKBCUiHDzsTxqSnO0XiNmyAuDgoW9btyIzxPZYITMBZtAhq1oRnnoGVK511Np+RMbmzRGACxv790K0btG7tTBKzahXcdpvbURnj++xOYRMw2reHL76A55+HgQNtxjBj8soSgfFre/ZA6dJOkbhhw6B4cYiMPP/rjDGnWdeQ8UuqzuBvWBi88IKzrn59SwLGXAhLBMbvbN8Obdo44wE1akCPHm5HZIx/s64h41fmzHHKQ4jA6NHwyCNQxH7OGHNRLBEYv3CqSFyNGs7UkW+/DZUquR2VMYHBfksZn5aaCq+95twVDFClCnz0kSUBYwqSJQLjs9ascQaABwxw7hA+ccLtiIwJTJYIjM85dgyee85JAnv2OOMCs2fDpZe6HZkxgckSgfE5R45AbCx06eLUCGrXzu2IjAlsXksEIjJZRP4QkfW5bBcRGSkiW0XkBxGp461YjG9TVQ4dgtdfd7qAypd3EkBsLPzlL25HZ0zg82aLIA5oc47tfwdu8jx6AOO8GIvxQceOHeO5556jadMh1KzpTBizapWzrXx5d2MzJph47fJRVV0pIqHn2OVuYJqqKvC1iFwpIhVUdbe3YjK+Y/Xq1XTt+iTbtvUGulCxYgpffFGGhg3djsyY4OPmGME1wK9Zlnd61p1FRHqISIKIJCQnJxdKcMY7Dh06RN++fWnSpAnbtr0O/At4mZtvfsSSgDEu8YvBYlWdoKpRqhoVYoXl/dZnn31G9erNGDMmzrPmKUqVimb8+Kt4//3pboZmTFBz887i34BrsyxX9KwzAWbfvn08/vgTTJ9eFPgcmAw8yR13XMX48eOpWLGiyxEaE9zcbBHMAx7wXD3UADho4wOBRVWJj4+natW/M336fTgJYC1XXjmbGTNmMH/+fEsCxvgAr7UIRGQm0BQoLyI7gReBYgCqOh74BLgd2AocBbp5KxZT+Hbv3k2fPn2YMwdgGZAO9OKee1IYNWoB1sVnjO/w5lVDMefZrkAfbx3fuENVmTJlCo8//gQpKQeBG4HPuOqqIUyYMJC2bdu6HaIxJhu/GCw2/uHnn3+mRYvbefDB/5GScuq2kK08/PBCNm1abEnAGB9lZajNRUtPT2f06NE8++wHnDgxBogAZlK5cjUmTRpDs2bN3A7RGHMO1iIwF2XDhg00atScfv1OcOLECqA8Iu148slE1q9PtCRgjB+wFoG5IKmpqQwdOpRXXnmFkydLA+8DsVSvHkdc3FvUr1/f7RCNMXlkLQKTb4mJidSuHc3AgYc5eTIN2Mcll4QzaNBukpKWWxIwxs9Yi8Dk2bFjxxg0aBDDhm1AdTbwN+Br6tc/RmxsLDVr1nQ7RGPMBbAWgcmTFStWUKNGU15/PRzV+cBBLr20GW+80ZYvv/zSkoAxfsxaBOacUlJSePbZZxk/fjywAmgAvEh09FfExk7mhhtucDlCY8zFshaBydUnn3xCtWrNGT/+VEG4x7nssluZOPFali1baEnAmABhLQJzlr179/Lvf/fjvfdKAouBWOBJ7rrrGsaNG8c11+RYLdwY46esRWAyqSqzZ8+matXbee+97sBEIJGyZWcxa9Ys5s6da0nAmABkLQIDwG+//Ubv3r2ZN68YsBxIBR7mX/86xttvf0J5mzvSmIBlLYIgp6pMnDiR6tXDmDdvHrAW+Jirr27BggXtmDHjXUsCxgQ4SwRBbNu2bdx2W2t69NjJoUMTPGu30qvXUjZtWsIdd9zhanzGmMJhXUNBKD09nbfffpvnn//IUySuFjCDG24IIzZ2LNHR0W6HaIwpRNYiCDLr16/n5ptv48knMzhxYhnwF0Ta8swzP7BuXYIlAWOCkLUIgsTJkyf5z3/+w6uvvkpq6qkicRMIC5vG1KmjiIqKcjtEY4xLrEUQBL777jsiI6MZNOgkqU67Z1kAAA83SURBVKkZwH6KFYvglVf28f33KywJGBPkrEUQwI4ePcoLL7zAm29uQTUeuBr4ggYNThAbG0tYWJjbIRpjfIC1CALUsmXLCAuL5o036qI6F9jHpZdG89Zb7Vm9erUlAWNMJmsRBJiDBw/yzDPPMGHCBE4XiRtIs2YJTJo0ncqVK7scoTHG11iLIIDMnz+fqlWbM2HCDM+aflx++a3ExlZm8eJPLAkYY3JkiSAAJCcnExNzL23bLuD335cCrwDQrl0lNm36kO7duyMi7gZpjPFZ1jXkx1SVmTNn0qfPWxw48DrQFFhMuXIzGTfufTp27GgJwBhzXl5tEYhIGxHZJCJbRaR/Dtu7ikiyiCR5Hg95M55AsnPnTtq2bcu9987hwIEVQCTQnfvvn86mTZ/SqVMnSwLGmDzxWotARIoCY4CWwE7gOxGZp6obsu06W1X7eiuOQJORkcHEiRN56qmnOXz4EHADMJe//e0NYmNfoU2bNm6HaIzxM95sEdQHtqrq/1T1JDALuNuLxwt4W7ZsoWnT1vTq9TuHD8d61m6jT5/V/PTTUksCxpgL4s1EcA3wa5blnZ512XUQkR9EJF5Ers3pjUSkh4gkiEhCcnKyN2L1aWlpaQwfPpyaNR9i1aq3gBeAY9x4Yw1WrlzJ6NGjKV26tNthGmP8lNtXDc0HQlU1HPgcmJrTTqo6QVWjVDUqJCSkUAN02w8//ED9+rfx9NNFOXlyGVCaIkXupH//jaxbl0CTJk3cDtEY4+e8edXQb0DWX/gVPesyqeq+LIuTgNe9GI9fOXHiBK+99hqvvfYaaWllcIrEjaVWrZnExY2iTp06bodojAkQ3mwRfAfcJCKVRaQ40BmYl3UHEamQZbEtsNGL8fiNr7/+moiIaF5+OYO0NOVUkbhXX00hMXG5JQFjTIHyWotAVdNEpC+wECgKTFbVH0XkZSBBVecBj4lIWyAN2A909VY8/uDIkSMMHDiQESN+Bj4E/gqsoFGjdGJjY6lWrZrLERpjApGoqtsx5EtUVJQmJCS4HUaBW7JkCd27P8eOHU8B/wSSKFGiL8OGdaZ3794UKeL2cI4xxp+JSKKq5lhz3u4sdtmBAwd46qmniI2NBVbiXHU7gBYtvmfixHcJDQ11N0BjTMCzn5kumjt3LlWrtiQ2drZnzWOUKdOUuLgqLFr0sSUBY0yhsETggt9//51//rMz7dot4o8/lgIvA9Chww1s2jSHLl26WHkIY0yhsa6hQqSqzJgxg759R3Hw4HCgCbCI8uXfY/z4eDp06OB2iMaYIGSJoJDs2LGDXr168emnl+NMGHMM6ErXrkV4442FlC1b1uUIjTHByrqGvCwjI4OxY8cSFlaDTz/9FEgEPqRixVYsXPgvpkyZbEnAGOMqSwRetHnzZm69tRV9+vzJkSNxAIj8zGOPfcPGjcto1aqVuwEaYwzWNeQVaWlpvPHGGwwc+CmpqeOA6kAcVarUYvLkcTRu3NjtEI0xJpO1CArY2rVriYpqSv/+JUlNXQqUokiR2xkwYCtr135rScAY43OsRVBAjh8/zuDBgxk6dChpaaWBjsAYwsNnM3XqaCIjI90O0RhjcmQtggLw5ZdfEh7elFdfvcRTJO5PihePZMiQoyQmLrckYIzxadYiuAiHDx9mwIABjBz5G/ARUB5YSpMmMGnSJKpUqeJyhMYYc37WIrhAixYtolq12xg58lYgHthFyZK3MmZMZ5YvX25JwBjjN6xFkE9//vknTzzxBHFxcThF4uoBz9Kq1XomTpzFdddd526AxhiTT9YiyIcPP/yQqlVbERcX71nzKFdcEc20aTX57LMFlgSMMX7JEkEe7Nmzhw4dOtGhwzKSk5cBrwDwz39WZdOmedx///1WJM4Y47esa+gcVJVp06bx2GNjSUl5A7gF+JSQkBlMmDCHdu3auR2iMcZcNEsEufjll1/o2bMnCxdeiTMWcBi4n+7dL+WNNz7nyiuvdDlCY4wpGNY1lE1GRgajR48mLKwmCxcuBL4DPuC66/7O4sVdiY2dZEnAGBNQLBFk8dNPP9G4cQseffQwR49OA5wicf36JbJhwzKaN2/ucoTGGFPwrGsISE1NZdiwYbz44mLS0sYBVYFJVK8eweTJ42nQoIHbIRpjjNcEfYvg+++/p27dpgwYcAVpaUuBYhQp0pqBA3fw/fffWBIwxgS8oG0RHD9+nJdeeolhw4aRnl4GaAeMoHbtD4mLG0N4eLjbIRpjTKEIyhbB6tWrqVkzmiFDSpKeDvAnl14aybBh6Xz77TJLAsaYoOLVFoGItAHeBooCk1R1SLbtlwLTgLrAPuAeVd3urXgOHTpE//7PMXbs78A8oCzwOdHRRZk0aRI33nijtw5tjDE+y2stAhEpCowB/g6EATEiEpZttweBP1X1RmAEMNRb8Xz22WdUq9aMsWObAx8Av1KqVDTjx9/H0qVLLQkYY4KWqKp33likITBIVVt7lp8DUNX/ZNlnoWefr0TkEmAPEKLnCCoqKkoTEhLyFcuECRPo2bMnDMrfOdSpUIfEHomZy/KSU0ZCXzwdXt0JdVmze02+3jen1yc8nEDdv9UFoMf8HkxcMzFf75nT69+58x161O0BwITECfRc0DNf75nT6x+u8zAT7poAQOKuRKImRuXrPXN6fda/hzHGO0QkUVVz/B/Wm2ME1wC/Zlne6VmX4z6qmgYcBMplfyMR6SEiCSKSkJycnO9A2rdvT/ny5fP9OmOMCQbebBF0BNqo6kOe5fuBm1W1b5Z91nv22elZ3ubZZ29u73shLQKAWbNmMXfuXEaOHElISEi+X2+MMf7sXC0Cbw4W/wZcm2W5omddTvvs9HQNXYEzaFzgOnfuTOfOnb3x1sYY49e82TX0HXCTiFQWkeJAZ5xLdbKaB3TxPO8ILD3X+IAxxpiC57UWgaqmiUhfYCHO5aOTVfVHEXkZSFDVeUAsMF1EtgL7cZKFMcaYQuTV+whU9RPgk2zrXsjy/DjQyZsxGGOMObegvLPYGGPMaZYIjDEmyFkiMMaYIGeJwBhjgpzXbijzFhFJBn65wJeXB3K9Wc3P2Ln4pkA5l0A5D7BzOaWSquZ4N63fJYKLISIJud1Z52/sXHxToJxLoJwH2LnkhXUNGWNMkLNEYIwxQS7YEsEEtwMoQHYuvilQziVQzgPsXM4rqMYIjDHGnC3YWgTGGGOysURgjDFBLiATgYi0EZFNIrJVRPrnsP1SEZnt2f6NiIQWfpR5k4dz6SoiySKS5Hk85Eac5yMik0XkD89kRDltFxEZ6TnPH0SkTmHHmFd5OJemInIwy2fyQk77uU1ErhWRZSKyQUR+FJF/57CPX3wueTwXf/lcSojItyKy1nMuL+WwT8F+h6lqQD1wSl5vA64HigNrgbBs+/QGxnuedwZmux33RZxLV2C027Hm4VxuBeoA63PZfjvwKSBAA+Abt2O+iHNpCixwO848nEcFoI7neWlgcw7/vvzic8njufjL5yLA5Z7nxYBvgAbZ9inQ77BAbBHUB7aq6v9U9SQwC7g72z53A1M9z+OB5iIihRhjXuXlXPyCqq7EmXMiN3cD09TxNXCliFQonOjyJw/n4hdUdbeqrvE8PwRs5Ox5xf3ic8njufgFz9/6sGexmOeR/aqeAv0OC8REcA3wa5blnZz9DyJzH1VNAw4C5QoluvzJy7kAdPA02+NF5NoctvuDvJ6rv2joadp/KiI13A7mfDxdC7Vxfn1m5XefyznOBfzkcxGRoiKSBPwBfK6quX4uBfEdFoiJINjMB0JVNRz4nNO/Eox71uDUdYkARgEfuRzPOYnI5cB/gX6qmuJ2PBfjPOfiN5+LqqaraiTOXO/1RaSmN48XiIngNyDrr+KKnnU57iMilwBXAPsKJbr8Oe+5qOo+VT3hWZwE1C2k2ApaXj43v6CqKaea9urM0ldMRMq7HFaORKQYzhfnDFX9MIdd/OZzOd+5+NPncoqqHgCWAW2ybSrQ77BATATfATeJSGURKY4zkDIv2z7zgC6e5x2BpeoZdfEx5z2XbP21bXH6Rv3RPOABz1UqDYCDqrrb7aAuhIhcfaq/VkTq4/x/5nM/NDwxxgIbVfXNXHbzi88lL+fiR59LiIhc6XleEmgJ/JRttwL9DvPqnMVuUNU0EekLLMS56mayqv4oIi8DCao6D+cfzHQR2Yoz6NfZvYhzl8dzeUxE2gJpOOfS1bWAz0FEZuJctVFeRHYCL+IMgqGq43Hmtr4d2AocBbq5E+n55eFcOgKPiEgacAzo7KM/NBoD9wPrPP3RAM8D14HffS55ORd/+VwqAFNFpChOsnpfVRd48zvMSkwYY0yQC8SuIWOMMflgicAYY4KcJQJjjAlylgiMMSbIWSIwxpggZ4nABBURSfdUnlwvIh+ISCm3Y8orEZkkImFux2ECj10+aoKKiBxW1cs9z2cAiee4mcpbMVziqQ9jjE+wFoEJZquAGwFE5D5PDfgkEXnHczMPInJYREZ46sIvEZEQz/rlIvJ2ltZFfc/6y8SZr+BbEfleRO72rO8qIvNEZCmwxHOn7jDPa9eJyD2e/Zp63jteRH4SkRlZ7oZdLiJRhf9nMoHOEoEJSp76LH/HuRO1OnAP0NhT6CsduNez62U4d3PWAFbg3EV8SinP/r2ByZ51A3Bu968P3AYME5HLPNvqAB1VNRr4BxAJRAAtPPudKhdSG+gHhOHMRdG4QE/emGwCrsSEMedRMksJglU4t+r3wCnW953nx3dJnPK/ABnAbM/zd4GsxcxmgjM/gYiU8dSHaQW0FZGnPPuUwFPmAKec8Kl5DG4BZqpqOvC7iKwA6gEpwLequhPAE2sosLoAzt2YHFkiMMHmmOdXfCZP18tUVX0uD6/XXJ6fWhagg6puynaMm4EjeYzxRJbn6dj/p8bLrGvIGFgCdBSRvwKISFkRqeTZVgSnWBnAvzjzl/mpfv1bcKpyHsQpEPholn792rkccxVwjzgTkITgTH/5bQGekzF5Zr80TNBT1Q0i8n/AIhEpAqQCfYBfcH7F1/ds/wPPl7/HcRH5HqfyaHfPuleAt4AfPO/1M3BnDoedAzTEmYdagWdUdY+IVCvwEzTmPOzyUWPOIevlptnWLweeUtWEwo/KmIJlXUPGGBPkrEVgjDFBzloExhgT5CwRGGNMkLNEYIwxQc4SgTHGBDlLBMYYE+T+Hw04Lc8pwsK/AAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.plot([0, query_vector[0]], [0, query_vector[1]], c='k', \n",
    "         linewidth=3, label='Query Vector')\n",
    "plt.plot([0, title_a_vector[0]], [0, title_a_vector[1]], c='b', \n",
    "          linestyle='--', label='Title A Vector')\n",
    "plt.plot([0, title_b_vector[0]], [0, title_b_vector[1]], c='g', \n",
    "         linewidth=2, linestyle='-.', label='Title B Vector')\n",
    "plt.xlabel('Pepperoni')\n",
    "plt.ylabel('Pizza')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Within our plot, `title_a_vector` and `query_vector` point in the same direction.  Shrinking `title_a_vector` will force the 2 lines to be exactly identical. Meanwhile, `title_b_vector` and `query_vector` point in different directions. There is no way to make these vectors overlap. Shrinking or lengthing `title_b_vector` will not yield alignment with the other 2 line segments.\n",
    "\n",
    "We've gained some insight by representing our vectors as line segments. Every vector has a geometrical length. That length is called the **magnitude**. Given vector `v`, we can measure the magnitude naively, by measuring the Euclidean distance between `v` and the origin. We can also find the magnitude using NumPy, by running `np.linalg.norm(v)`. Finally, we can compute the magnitude using the Pythagorean theorem. According the that theorem, the magnitude of `v` equals `(v @ v) ** 0.5`.\n",
    "\n",
    "**Listing 13. 27. Computing vector magnitude**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Query Vector's magnitude is approximately 1.4142\n",
      "Title A Vector's magnitude is approximately 4.2426\n",
      "Title B Vector's magnitude is approximately 1.0000\n",
      "\n",
      "Vector A is 3x as long as Query Vector\n"
     ]
    }
   ],
   "source": [
    "from scipy.spatial.distance import euclidean\n",
    "from numpy.linalg import norm\n",
    "\n",
    "vector_names = ['Query Vector', 'Title A Vector', 'Title B Vector']\n",
    "tf_search_vectors = [query_vector, title_a_vector, title_b_vector]\n",
    "origin = np.array([0, 0])\n",
    "for name, tf_vector in zip(vector_names, tf_search_vectors):\n",
    "    magnitude = euclidean(tf_vector, origin)\n",
    "    assert magnitude == norm(tf_vector)\n",
    "    assert magnitude == (tf_vector @ tf_vector) ** 0.5\n",
    "    print(f\"{name}'s magnitude is approximately {magnitude:.4f}\")\n",
    "\n",
    "magnitude_ratio = norm(title_a_vector) / norm(query_vector)\n",
    "print(f\"\\nVector A is {magnitude_ratio:.0f}x as long as Query Vector\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The magnitude of `title_vector_b` is equal to exactly 1. A vector with a magnitude of 1 is referred to as a **unit vector**. One benefit of unit vectors is that they are easy to compare. Since unit-vectors share an equal magnitude, it doesn't play a role in their similarity. \n",
    "\n",
    "Dividing any vector by its magnitude will transform that magnitude to 1. That division by the magnitude is called **normalization**.  Running `v / norm(v)` will return a **normalized vector** with a magnitude of 1.  We'll now normalize our vectors and a generate a unit-vector plot. Within the plot, 2 of the vectors should be totally identical.\n",
    "\n",
    "**Listing 13. 28. Plotting normalized vectors**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD4CAYAAAD8Zh1EAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO3deXiU5b3/8fdXdDJgVRSwiiBEf0IFsggBVI6yKIsb0CqV1CJSFbXQHqytYk9VtNqjYk+twGm1KrgcAUExQw8tp1iX6kFLUFzAohgxJPE6YhBcKALy/f0xyWSbkAmZZLbP67pyXXmWPHM/meHDnXt7zN0REZHUd1CiCyAiIvGhQBcRSRMKdBGRNKFAFxFJEwp0EZE0cXCiXrhz587es2fPRL28iEhKWrt27Sfu3iXasYQFes+ePSkuLk7Uy4uIpCQz+7CxY2pyERFJEwp0EZE0oUAXEUkTCWtDj2bPnj2UlZWxa9euRBdFMlgwGKRbt24ccsghiS6KSLMkVaCXlZVx2GGH0bNnT8ws0cWRDOTuVFZWUlZWRnZ2dqKLI9IsSdXksmvXLjp16qQwl4QxMzp16qS/EiUlJVWgAwpzSbhM+AyWlJRw+eWX6z+uNBNTk4uZjQF+C7QDHnT3O+sd/w0wvGqzA3C0u3eMZ0FFJD5KSko488xvU17+FuXl5TzzzDMEg8FEF0vioMkaupm1A+YB5wB9gEIz61P7HHe/1t3z3T0fmAM83RqFbQtmxnXXXRfZvueee5g1a1abluGyyy5j6dKlAFxxxRVs2LChRdfbvHkz/fr1i3ps/fr1jBgxgt69e3PiiSdyyy23sG/fvha9XnPceuut3HjjjXX2rVu3jpNPPrnZ11q3bh0rVqyIV9HSUklJCcOGjaC8fAFwHy+88AJvvPFGgksl8RJLk8sgYJO7l7j7bmARMG4/5xcCC+NRuETIysri6aef5pNPPjmgn9+7d29cy/Pggw/Sp0+fpk88AP/85z8ZO3YsM2fOZOPGjbz11lv8/e9/57e//W1crv/11183eU5hYSGLFy+us2/RokUUFhY2+/UOJNDj/X4ls5KSEoYPH86WLR8C1xEIPE1RURGDBw9OdNEkTmIJ9OOALbW2y6r2NWBmPYBs4K+NHJ9qZsVmVrx169bmlrVNHHzwwUydOpXf/OY3DY5t3ryZESNGkJuby1lnnUVpaSkQrlFfffXVDB48mOuvv55Zs2YxefJkzjjjDHr06MHTTz/N9ddfT05ODmPGjGHPnj0A3HbbbQwcOJB+/foxdepUoj09atiwYRQXFxMKhcjPzyc/P5/evXtHRmCsXbuWoUOHMmDAAEaPHs1HH30U2Z+Xl0deXh7z5s2Leq9PPPEEQ4YMYdSoUQB06NCBuXPnMnv2bABmzZrFPffcEzm/X79+bN68GYDHH3+cQYMGkZ+fz1VXXRUJ72984xtcd9115OXlcccddzB+/PjIz//lL3/h29/+dp0y9OrViyOPPJJXX301su/JJ5+MBPr//M//cNppp9G/f38mTJjAF198AcCaNWs4/fTTycvLY9CgQezYsYObb76ZxYsXk5+fz+LFi9m2bRvjx48nNzeXU089lTfffDNyX5MmTWLIkCFMmjQp6u8m3ZSUlHDGGRdTWno6AMHgapYvnxl57yVNuPt+v4CLCLebV29PAuY2cu4NwJymrunuDBgwwOvbsGFD5Hug1b7259BDD/UdO3Z4jx49fPv27T579my/5ZZb3N39/PPP9wULFri7+0MPPeTjxo1zd/fJkyf7eeed53v37nV391tuucWHDBniu3fv9nXr1nn79u19xYoV7u4+fvx4X7Zsmbu7V1ZWRl73+9//vodCocj1lixZ4u7uQ4cO9TVr1tQp44QJE3zu3Lm+e/duP+200/zjjz92d/dFixb5lClT3N09JyfHX3jhBXd3/+lPf+p9+/ZtcK/XXnut33vvvQ32d+zY0T/99FO/5ZZbfPbs2ZH9ffv29Q8++MA3bNjg559/vu/evdvd3a+55hp/5JFHIu/b4sWL3d1937593rt370j5CgsLI/dY2+zZs33GjBnu7r569Wqv/mxs3brVzzjjDP/iiy/c3f3OO+/0W2+91b/66ivPzs72v//97+7uvmPHDt+zZ4/Pnz/fp02bFrnu9OnTfdasWe7u/uyzz3peXl7k/enfv7/v3LmzQVmq1f4sprr333/fu3YtcPiHww7PyurmK1euTHSx5AABxd5IrsbSKVoOdK+13a1qXzQTgWkxXDOpHX744Vx66aXcd999tG/fPrJ/9erVPP10uHtg0qRJXH/99ZFjEyZMoF27dpHtc845h0MOOYScnBy+/vprxowZA0BOTk6klvvcc89x9913s3PnTrZt20bfvn254IIL9lu2u+++m/bt2zNt2jTefvtt3n77bUaOHAmEmziOPfZYtm/fzvbt2znzzDMjZf3Tn/7U8l9MlWeffZa1a9cycOBAINx0c/TRRwPQrl07LrzwQiDcHzFp0iQef/xxpkyZwurVq3n00UcbXO/iiy/m9NNP59e//nWd5pZXXnmFDRs2MGTIEAB2797NaaedxsaNGzn22GMjr3/44YdHLedLL73EU089BcCIESOorKzks88+A2Ds2LF13tt0VV0zr6h4HOhKIDCOUOgh1czTVCyBvgY4ycyyCQf5ROB79U8ys28BRwKr41rCBJkxYwb9+/dnypQpMZ1/6KGH1tnOysoC4KCDDuKQQw6JDIU76KCD2Lt3L7t27eKHP/whxcXFdO/enVmzZjU5hGzVqlUsWbKEF198EQj/ddW3b19Wr677K9++fXtMZe7Tp0/kWtVKSkro1KkTHTt25OCDD67TQVpdPndn8uTJ/Pu//3uDawaDwTr/sU2ZMoULLriAYDDIhAkTOPjghh+57t27k52dzQsvvMBTTz0VuR93Z+TIkSxcWLdL5q233orp/van/vuVjqKFuZpZ0luTbejuvheYDqwE3gGedPf1ZnabmY2tdepEYFHVnwQt1tifFPH4isVRRx3Fd7/7XR566KHIvtNPP51FixYB8F//9V+cccYZB3x/1eHYuXNnvvjii8iolsZ8+OGHTJs2jSVLlkRqlr1792br1q2RANyzZw/r16+nY8eOdOzYkZdeeilS1mguueQSXnrpJVatWgWEa9o//vGPufXWW4HwEsevvfYaAK+99hoffPABAGeddRZLly7l448/BmDbtm18+GH0FT27du1K165duf322/f7n2NhYSHXXnstJ5xwAt26dQPg1FNP5eWXX2bTpk0AfPnll7z77rv07t2bjz76iDVr1gDw+eefs3fvXg477DA+//zzyDXPOOOMyL0///zzdO7cudHafLqp7gCtqMhDYZ45YppY5O4r3L2Xu5/o7ndU7bvZ3UO1zpnl7jNbq6CJcN1119UZ7TJnzhzmz59Pbm4ujz32WItGg3Ts2JErr7ySfv36MXr06EjzQWMWLFhAZWUl48ePJz8/n3PPPZdAIMDSpUu54YYbyMvLIz8/n//93/8FYP78+UybNo38/PxG/xNr3749oVCIO+64g169etG5c2eGDBnCJZdcAsCFF14YaQqaO3cuvXr1AsI1+9tvv51Ro0aRm5vLyJEjI52x0VxyySV07959v0MRJ0yYwPr16+uMbunSpQsLFiygsLCQ3NxcTjvtNP7xj38QCARYvHgxP/rRj8jLy2PkyJHs2rWL4cOHs2HDhkin6KxZs1i7di25ubnMnDmTRx55ZL+/43QRHpo4vKrT/iGysvIU5pmiNWvC+/tqqlNU2t6yZcs8OzvbN2/eHNfrTps2zR988MG4XrO1pepnsaYD9BWHgR4MBtUBmmZoYaeoZIjx48fXGWYYDwMGDODQQw/l17/+dVyvKw01bDP/BkVFRaqZZxAFurSqtWvXJroIGUEdoAIKdJGUF16b5SIqKhaiMM9sCnSRFFY9mqW8/CPgVQKBBQrzDKZAF0lRNc0s/wT2EAxepTbzDKdAF0lBddvMK8nKOkthLsn3gItEy5Tlc996663IYl9HHXUU2dnZ5Ofnc/bZZ1NRUcFFF10ENFzBcMGCBUyfPr3ZZZgxYwbHHXdc1KV5d+7cSadOnSLT8quNHz++wUqMsbj33nvZuXNns38uVTTsAP0FoZDCXBToDWTK8rk5OTmsW7eOdevWMXbsWGbPns26detYtWoVXbt2jfyHEo81xvft28eyZcvo3r07L7zwQoPjHTp0YPTo0Sxbtiyyb8eOHbz00ktNrm0TzYEEeixL/SYDjWaR/VGg15NJy+c2prpGv3v37gZL0ta2detWLrzwQgYOHMjAgQN5+eWXo17v+eefp2/fvlxzzTUN1mWpVlhYGFlWAWDZsmWMHj2aDh068OWXX/KDH/yAQYMGccopp1BUVASEQ/inP/0p/fr1Izc3lzlz5nDfffdRUVHB8OHDGT48/BCthQsXkpOTQ79+/bjhhhsir1F7qd/66+Eko5rp/DehMJeoGptx1NpfscwUHTq04de8eeFjX34Z/fj8+eHjW7c2PBaLTFo+t1rt13N3/+CDDyLn11+StvZ2YWGh/+1vf3N39w8//NC/9a1vRb3+FVdc4Y8++qjv2LHDu3btGll2t7avvvrKjz76aP/kk0/c3X306NG+fPlyd3e/8cYb/bHHHnN3908//dRPOukk/+KLL/w///M//cILL/Q9e/bU+X326NHDt27d6u7u5eXl3r17d//44499z549Pnz48Mjvn1pL/daXbDNF33//fT/++OOrloA+xgOBMzQDNEOxn5miqqFHUXv53NpWr17N974XXmhy0qRJkcWv4MCXzx08eDA5OTn89a9/Zf369U2WrfbyuRs3bowsn5ufn8/tt99OWVlZ1OVzW8OqVauYPn06+fn5jB07ls8++yzyAIpqu3fvZsWKFYwfP57DDz+cwYMHs3LlygbXCgQCjB07lqVLl/LJJ5/w+uuvM3r0aCD8kIs777yT/Px8hg0bxq5duygtLWXVqlVcddVVkRUcjzrqqAbXXbNmDcOGDaNLly4cfPDBXHLJJZEVJmsv9ZvMah5O8UOgHcHgdpYv/4Vq5tJAUo9yef75xo916LD/45077/94UzJh+dyW2rdvH6+88sp+HzC8cuVKtm/fTk5ODhDuAG3fvj3nn39+g3MLCwv55S9/ibszbtw4DjnkECB8n0899RS9e/eOa/nrL/WbjBq2mS+iqOguhblEpRp6IzJh+dxY1F+StrZRo0YxZ86cyPa6desanLNw4UIefPBBNm/ezObNm/nggw/4y1/+ErXTctiwYbz33nvMmzevzqqLo0ePZs6cOZE+htdffx2AkSNHcv/990c6ordt29agzIMGDeKFF17gk08+4euvv2bhwoUMHTr0QH4VbS56B6jCXBqnQN+PdF8+Nxb1l6St7b777qO4uJjc3Fz69OnD73//+zrHd+7cyZ///GfOO++8yL5DDz2Uf/mXf2H58uUNXuuggw7ioosuorKysk7o3nTTTezZs4fc3Fz69u3LTTfdBISHdB5//PHk5uaSl5fHE088AcDUqVMZM2YMw4cP59hjj+XOO+9k+PDh5OXlMWDAAMaN298zzpODRrPIgbCW/GNviYKCAi8uLq6z75133tnvmtkibSWRn8Xq0SylpV2BpwkELlGYS4SZrXX3gmjHkroNXSTTlJSUMHToGMrKSoFSsrL6EAotVphLTNTkIpIkqptZyspCwNUEg0GFuTSLaugiSaBhm/lGrc0izRZTDd3MxpjZRjPbZGZRnxtqZt81sw1mtt7MnohvMUXSlzpAJV6arKGbWTtgHjASKAPWmFnI3TfUOuck4EZgiLt/amZHt1aBRdJJdZt5RUUIhbm0VCxNLoOATe5eAmBmi4BxQO0lAK8E5rn7pwDu/nG8CyqSbqpHs4Q7QOcQCLyjMJcWiaXJ5ThgS63tsqp9tfUCepnZy2b2ipmNiXYhM5tqZsVmVrx169YDK3ErqqysjCyAdcwxx3DcccdFtk8//XQgvHBV9XhnCC88FW3WY2M2b95M+/btyc/PJy8vj9NPP52NGzc2OO+EE05osH/GjBncddddzb6vBQsWUFFR0eyfk9ZTM53/WACCwYcV5tJi8RrlcjBwEjAMKAT+YGYd65/k7g+4e4G7F3Tp0iVOLx0/nTp1iiwpe/XVV3PttddGtqsn7NQP9ANx4oknsm7dOt544w0mT57Mr371qwbnTJw4sc7qg/v27WPp0qVMnDix2a93IIGeKsvJpqK6beZPk5V1hDpAJS5iCfRyoHut7W5V+2orA0LuvsfdPwDeJRzwaeMb3/gGADNnzuRvf/sb+fn5DZbYbWyZ1/357LPPOPLIIxvsLywsrDMz88UXX6RHjx706NGDr7/+mp/97GcMHDiQ3Nxc7r///sh5d911Fzk5OeTl5TFz5kyWLl1KcXExl1xyCfn5+fzzn//k2Wef5ZRTTiEnJ4cf/OAHfPXVVwD07NmTG264gf79+7NkyZID+j3J/jXsAP0+odCTCnOJi1ja0NcAJ5lZNuEgnwh8r945zxCumc83s86Em2BKWlo4u9WadX7/Y/uzduraBj/vt8RvNuydd97JPffcwx//+Ecg3ORS7Y477mDEiBE8/PDDbN++nUGDBnH22Wc3WLjr/fffJz8/n88//5ydO3fy6quvNnidnJwcDjroIN544w3y8vJYtGhRZH2Thx56iCOOOII1a9bw1VdfMWTIEEaNGsU//vEPioqKePXVV+nQoQPbtm3jqKOOYu7cudxzzz0UFBSwa9cuLrvsMp599ll69erFpZdeyu9+9ztmzJgBhP9Kee211+L2+5IaGs0ira3JGrq77wWmAyuBd4An3X29md1mZmOrTlsJVJrZBuA54GfuXtlahU5WjS3zWl91k8v777/Pvffey9SpU6Ner/qhD3v37uWZZ55hwoQJkdd59NFHyc/PZ/DgwVRWVvLee++xatUqpkyZQocOHYDoy8lu3LiR7OxsevXqBcDkyZMjqzcCXHzxxS3+PUhDNQ+n+C4Kc2ktMU0scvcVwIp6+26u9b0DP6n6ipuW1qzjWTOP6fUOYJnXsWPHNrpE78SJExk1ahRDhw4lNzeXb37zm5HXmTNnTmS98GrR1hlvrvp/TUjL1azNUgrcSCDwhFZNlFahqf/NtL/lZBtb5nV/XnrpJU488cSox0488UQ6d+7MzJkzGywn+7vf/S7yKLt3332XL7/8kpEjRzJ//vzI0rTRlpPt3bs3mzdvZtOmTQA89thjKbOcbCqqGc1yH3AMweAhCnNpNZr630y5ubm0a9eOvLw8LrvsMk455ZTIsZtuuokZM2aQm5vLvn37yM7OjrS111bdhu7uBAIBHnzwwUZfr7CwkJkzZ/Kd73wnsu+KK65g8+bN9O/fH3enS5cuPPPMM4wZM4Z169ZRUFBAIBDg3HPP5Ve/+lXkmaft27dn9erVzJ8/nwkTJrB3714GDhzI1VdfHd9fkgDR2sxPoqhITxqS1qPlc0WiaOlnUR2g0lr2t3yumlxE4kxhLomiQBeJo5rRLB8C2xTm0qaSrg3d3SMPVBZJhANthiwpKeHMMy+ivLwC2EtW1ghCIc0AlbaTVDX0YDBIZWVli56BKdIS7k5lZSXBYLBZP1fdzFJevhD4Q9XDKRTm0raSqoberVs3ysrKSMaFuyRzBINBunXrFvP5DdvMH9XaLJIQSRXohxxyCNnZ2YkuhkjM1AEqySSpAl0klZSUlDBs2HAqKpagMJdkoEAXOQDVo1m2bCkFfkwg0EFhLgmXVJ2iIqmgZjr/SACCwTcU5pIUVEMXaYb6beZZWc9SVHS/wlySgmroIjGK1gEaCinMJXko0EVioNEskgoU6CJNqJnO3ws4VmEuSUtt6CL7ER6aOKJqNMsTZGX9jVDoQYW5JCXV0EUaUd3MsmXL08DQqun8CnNJXqqhi0TRsM38IE3nl6QXUw3dzMaY2UYz22RmM6Mcv8zMtprZuqqvK+JfVJG2oQ5QSVVN1tDNrB0wDxgJlAFrzCzk7hvqnbrY3ae3QhlF2kx4CdzvUFGxGIW5pJpYauiDgE3uXuLuu4FFwLjWLZZI26sezVJe/hbwV4W5pJxYAv04YEut7bKqffVdaGZvmtlSM+se7UJmNtXMis2sWEvkSjKpmc7/NbCPYPAnCnNJOfEa5bIc6OnuucBfgEeineTuD7h7gbsXdOnSJU4vLdIyddvMQ2RltVcHqKSkWAK9HKhd4+5WtS/C3Svd/auqzQeBAfEpnkjratgB+lNCoWcU5pKSYgn0NcBJZpZtZgFgIhCqfYKZHVtrcyzwTvyKKNI6NJpF0k2To1zcfa+ZTQdWAu2Ah919vZndBhS7ewj4sZmNBfYC24DLWrHMIi1WM53/HhTmki4sUQ9kLigo8OLi4oS8tmS26jAvLS0FOhEInMzy5TcpzCUlmNlady+IdkxT/yWj1IxmuQ4IEAx+qTCXtKFAl4xRt818CoFArkazSFpRoEtGiN4BeofCXNKKAl3SnkazSKZQoEtaqxnNsgcIKswlrWn5XElbJSUlDB16HmVlpUApWVk5hEJLFeaStlRDl7RU3cxSVvYM8LOqh1MozCW9KdAl7TRsMy/WaBbJCAp0SSvqAJVMpjZ0SRvhBzqPpqIihMJcMpECXdJC9WiWLVtKgTsJBLYozCXjKNAl5dU0s2QDpQSDT6rNXDKSAl1SWt028yPJyjqZoqKFCnPJSAp0SVnROkBDIYW5ZC6NcpGUpNEsIg0p0CXl1EznvwCFuUgNBbqklLoPp5hFIDBEYS5SRYEuKaPm4RT3Az0IBrNYvvwehblIFXWKSkpo2Gbek6KiBxTmIrXEVEM3szFmttHMNpnZzP2cd6GZuZlFfd6dyIGI3gH6c4W5SD1NBrqZtQPmAecAfYBCM+sT5bzDgH8FXo13ISVzlZSUcOaZGs0iEotYauiDgE3uXuLuu4FFwLgo5/0SuAvYFcfySQar7gAtL98EVCjMRZoQS6AfB2yptV1WtS/CzPoD3d39v/d3ITObambFZla8devWZhdWMke4Zj6B0tL/A7aTlXWuwlykCS0e5WJmBwH/AVzX1Lnu/oC7F7h7QZcuXVr60pKmqtvMy8ufAB6rejiF1mYRaUosgV4OdK+13a1qX7XDgH7A82a2GTgVCKljVA5Eww7Q32mhLZEYxRLoa4CTzCzbzALARCBUfdDdd7h7Z3fv6e49gVeAse5e3CollrSl6fwiLdNkoLv7XmA6sBJ4B3jS3deb2W1mNra1CyiZIfxwiuFUVNyLwlzkwMQ0scjdVwAr6u27uZFzh7W8WJJJ6j6c4ocEAp0U5iIHQFP/JaFqpvOHR8IGg+8qzEUOkKb+S8LUbzPPyvozRUVzFeYiB0g1dEmI6A+nUJiLtIQCXdqcRrOItA4FurSpmodTdAO+qTAXiSO1oUubCQ9NPKtqNEspWVmvEgotUJiLxIlq6NImqptZtmwpAs6pms6vMBeJJ9XQpdU1bDPfpen8Iq1ANXRpVeoAFWk7qqFLqwkvgTueioolKMxFWp8CXVpFzcMpSoHlBAIrFOYirUyBLnFX08wS/ngFgzepzVykDSjQJa7qtpnvJStrEEVFyxTmIm1AgS5xE306v8JcpK1olIvEhUaziCSeAl1arGY6/49RmIskjgJdWqQ6zEtLS4HpBAKjFeYiCaJAlwNW83CKXwDtCQZ3s3z5LIW5SIIo0OWA1G0zn0ggkKuhiSIJFlOgm9kYM9toZpvMbGaU41eb2Vtmts7MXjKzPvEvqiSL6B2gtynMRRKsyUA3s3bAPOAcoA9QGCWwn3D3HHfPB+4G/iPuJZWkoNEsIskrlhr6IGCTu5e4+25gETCu9gnu/lmtzUMBj18RJVnUjGb5AminMBdJMrFMLDoO2FJruwwYXP8kM5sG/AQIACOiXcjMpgJTAY4//vjmllUSqKSkhKFDx1JWVgpAVlY+odDTCnORJBK3TlF3n+fuJwI3AL9o5JwH3L3A3Qu6dOkSr5eWVlbdzFJW9hQwq+rhFApzkWQTS6CXA91rbXer2teYRcD4lhRKkkfDNvMXNJpFJEnFEuhrgJPMLNvMAsBEIFT7BDM7qdbmecB78SuiJIo6QEVSS5Nt6O6+18ymAyuBdsDD7r7ezG4Dit09BEw3s7OBPcCnwOTWLLS0vvADnUdSURFCYS6SGsw9MQNSCgoKvLi4OCGvLftXdzr/RQQClQpzkSRhZmvdvSDaMS2fK3XUNLP0AUoJBv+oNnORFKFAl4i6beadycrqQ1HRYwpzkRShQBegsYdTKMxFUokW5xKNZhFJEwr0DFcznX8ECnOR1KZAz2B1R7PcTSAwWGEuksIU6Bmq5uEUC4BeBINBli+/V2EuksIU6Bmobpt5AYFAVw1NFEkDCvQME70D9EaFuUgaUKBnkJKSEs48U6NZRNKVAj1DVHeAlpe/A7ynMBdJQ5pYlAGqa+bl5ZXAlwSDE9RmLpKGVENPc9Vt5uXljwOLCQaDCnORNKUaehpr2AF6jcJcJI2php6mNJ1fJPOohp6GaqbzL0BhLpI5FOhppu50/qsJBLpqnLlIhlCTSxqpmc5fCEAwWKowF8kgqqGniYZt5iGKirQ2i0gmiamGbmZjzGyjmW0ys5lRjv/EzDaY2Ztm9qyZ9Yh/UaUx0TtAFeYimabJQDezdsA84BygD1BoZn3qnfY6UODuucBS4O54F1Si02gWEakWSw19ELDJ3UvcfTewCBhX+wR3f87dd1ZtvgJ0i28xJZqa0Sydgc4Kc5EMF0sb+nHAllrbZcDg/Zx/OfCnaAfMbCowFeD444+PsYgSTUlJCcOGjWTLllKglKyskwmFHleYi2SwuI5yMbPvAwXA7GjH3f0Bdy9w94IuXbrE86UzSnUzy5YtIeAigsGgwlxEYqqhlwPda213q9pXh5mdDfwbMNTdv4pP8aS+hm3mlZrOLyJAbDX0NcBJZpZtZgFgIhCqfYKZnQLcD4x194/jX0wBdYCKyP41GejuvheYDqwE3gGedPf1ZnabmY2tOm028A1giZmtM7NQI5eTA1RSUsLQoWMV5iLSqKl/lXkAAAmASURBVJgmFrn7CmBFvX031/r+7DiXS2qpHs1SVlYKLCQQeFFhLiINaKZokqtpZjkUgGDwLrWZi0hUCvQkVrfNvB1ZWfkUFT2tMBeRqBToSSpaB2gopDAXkcZptcUkpNEsInIgFOhJpmY6/1QU5iLSHAr0JFL34RT/SiAwUmEuIjFToCeJmodT3AEcTjDoLF9+m8JcRGKmQE8CddvMxxEI9NXQRBFpNgV6gkXvAJ2lMBeRZlOgJ5BGs4hIPGkceoLUjGZpB+xVmItIiynQE6CkpIQzzxxPefkWwMnKGkQotExhLiItoiaXNlbdzFJevgSYXfVwCoW5iLScAr0NNWwz/2+NZhGRuFGgtxF1gIpIa1MbehsIP9D5LCoqilCYi0hrUaC3surRLFu2lAIzCQR2KcxFpFUo0FtRTTNLf6CUYPA5tZmLSKtRoLeSum3m3yQr61WKihYozEWk1cTUKWpmY8xso5ltMrOZUY6faWavmdleM7so/sVMLQ07QL9DKKQwF5HW1WSgm1k7YB5wDtAHKDSzPvVOKwUuA56IdwFTjUaziEiixFJDHwRscvcSd98NLALG1T7B3Te7+5vAvlYoY8qomc5/KgpzEWlrsQT6ccCWWttlVfuazcymmlmxmRVv3br1QC6RtMJDE6sfTjGXrKwBCnMRaVNtOrHI3R9w9wJ3L+jSpUtbvnSrqm5m2bLlCSC3ajr/XIW5iLSpWEa5lAPda213q9onRGsz76ShiSKSELHU0NcAJ5lZtpkFgIlAqHWLlRrUASoiyaTJQHf3vcB0YCXwDvCku683s9vMbCyAmQ00szJgAnC/ma1vzUIng/ASuBMU5iKSNGKaWOTuK4AV9fbdXOv7NYSbYjJC9WiW8vL/A94kELhGYS4iCaeZos0UrplfTHn5Z8BXBIOXqs1cRJKCls9thpqHUzwOPE1WVlBhLiJJQzX0GDXsAL2GUEhhLiLJQzX0GGg0i4ikAtXQm1Aznf9+FOYikswU6PtRHebh6fxXEQhks3z5zxXmIpKU1OTSiOpmltLSywEjGPxYYS4iSU019CgatpkvpajoHoW5iCQ11dDrid4BqjAXkeSnQK9Fo1lEJJUp0KvUjGY5FDhSYS4iKUdt6FQ/nGI0W7aUAqVkZZ1MKLRQYS4iKSXja+g1D6cIAZOrHk6hMBeR1JPRNfSGbealWptFRFJWxtbQ1QEqIukmI2voJSUlDB16HhUVz6AwF5F0kXGBXj2apaysFPgDgcBrCnMRSQsZFeg1zSxHAaUEg/PUZi4iaSNjAr1um3mQrKwcioqWKsxFJG3E1ClqZmPMbKOZbTKzmVGOZ5nZ4qrjr5pZz3gXtCUadoBOIRRSmItIemky0M2sHTAPOAfoAxSaWZ96p10OfOru/w/4DXBXvAt6oDSaRUQyRSxNLoOATe5eAmBmi4BxwIZa54wDZlV9vxSYa2bm7h7Hsjbbhx9+WDWd/zqY9S0AdgOjVz8Hq/f/s/2P7c/aqWsj23arAeC31NzSgAcG8NpHrzWrTNF+vvjKYgZ0HQDA1OVT+cNrf2jWNaP9/P3n38/UAVMBeGDtA1z1x6uadc1oP39l/yt54IIHAFhbsZaCPxQ065rRfr6x33Os9D7pfar/8239PtUuQ6LF0uRyHLCl1nZZ1b6o57j7XmAH0Kn+hcxsqpkVm1nx1q1bD6zEzdCxY0eOOeYY4Get/loiIolmTVWizewiYIy7X1G1PQkY7O7Ta53zdtU5ZVXb71ed80lj1y0oKPDi4uI43ML+7dixg3HjxvHzn+vhFCKS+sxsrbtH/bMrliaXcqB7re1uVfuinVNmZgcDRwCVB1DWuDviiCN47rnnMGven4giIqkmliaXNcBJZpZtZgFgIhCqd054Zauwi4C/Jrr9vDaFuYhkgiZr6O6+18ymAyuBdsDD7r7ezG4Dit09BDwEPGZmm4BthENfRETaUEwTi9x9BbCi3r6ba32/C5gQ36KJiEhzZOxqiyIi6UaBLiKSJhToIiJpQoEuIpImmpxY1GovbLYV+LANX7Iz0OhEpzSg+0td6XxvoPuLtx7u3iXagYQFelszs+LGZlelA91f6krnewPdX1tSk4uISJpQoIuIpIlMCvQHEl2AVqb7S13pfG+g+2szGdOGLiKS7jKphi4iktYU6CIiaSKtAj3VH2bdlBju7ydmtsHM3jSzZ82sRyLKeaCaur9a511oZm5mSTFULFax3J+ZfbfqPVxvZk+0dRlbIobP5/Fm9pyZvV71GT03EeU8EGb2sJl9XPUwn2jHzczuq7r3N82sf1uXEQB3T4svwkv7vg+cAASAN4A+9c75IfD7qu8nAosTXe44399woEPV99ek2/1VnXcY8CLwClCQ6HLH+f07CXgdOLJq++hElzvO9/cAcE3V932AzYkudzPu70ygP/B2I8fPBf4EGHAq8GoiyplONfTIw6zdfTdQ/TDr2sYBj1R9vxQ4y1Ln6RdN3p+7P+fuO6s2XyH8dKlUEcv7B/BL4C5gV1sWLg5iub8rgXnu/imAu3/cxmVsiVjuz4HDq74/Aqhow/K1iLu/SPhZD40ZBzzqYa8AHc3s2LYpXY10CvS4Pcw6ScVyf7VdTrjGkCqavL+qP2O7u/t/t2XB4iSW968X0MvMXjazV8xsTJuVruViub9ZwPfNrIzw8xV+1DZFaxPN/ffZKmJ6wIWkFjP7PlAADE10WeLFzA4C/gO4LMFFaU0HE252GUb4r6sXzSzH3bcntFTxUwgscPdfm9lphJ9y1s/d9yW6YOkinWrozXmYNcn2MOsYxHJ/mNnZwL8BY939qzYqWzw0dX+HAf2A581sM+F2ylAKdYzG8v6VASF33+PuHwDvEg74VBDL/V0OPAng7quBIOGFrdJBTP8+W1s6BXrKP8y6CU3en5mdAtxPOMxTqf0Vmrg/d9/h7p3dvae79yTcRzDW3YsTU9xmi+Xz+Qzh2jlm1plwE0xJWxayBWK5v1LgLAAzO5lwoG9t01K2nhBwadVol1OBHe7+UZuXItG9x3HuiT6XcK3mfeDfqvbdRvgfPoQ/QEuATcDfgRMSXeY4398q4P+AdVVfoUSXOZ73V+/c50mhUS4xvn9GuFlpA/AWMDHRZY7z/fUBXiY8AmYdMCrRZW7GvS0EPgL2EP5L6nLgauDqWu/dvKp7fytRn01N/RcRSRPp1OQiIpLRFOgiImlCgS4ikiYU6CIiaUKBLiKSJhToIiJpQoEuIpIm/j9eUUSk+BRFsQAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "unit_query_vector = query_vector / norm(query_vector)\n",
    "unit_title_a_vector = title_a_vector / norm(title_a_vector)\n",
    "assert np.allclose(unit_query_vector, unit_title_a_vector)\n",
    "unit_title_b_vector = title_b_vector\n",
    "\n",
    "plt.plot([0, unit_query_vector[0]], [0, unit_query_vector[1]], c='k', \n",
    "         linewidth=3, label='Normalized Query Vector')\n",
    "plt.plot([0, unit_title_a_vector[0]], [0, unit_title_a_vector[1]], c='b', \n",
    "          linestyle='--', label='Normalized Title A Vector')\n",
    "plt.plot([0, unit_title_b_vector[0]], [0, unit_title_b_vector[1]], c='g', \n",
    "         linewidth=2, linestyle='-.', label='Title B Vector')\n",
    "\n",
    "plt.axis('equal')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The normalized query vector and the normalized Title B vector are now indistinguishable. Meanwhile, the location of the Title B vector diverges from the query vector, because the two segments point in different directions. As a consequence, Title A should now outrank Title B, relative to the query.\n",
    "\n",
    "**Listing 13. 29. Ranking titles by unit-vector similarity**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "'A: Pepperoni Pizza! Pepperoni Pizza! Pepperoni Pizza!' has a normalized query similarity of 1.0000\n",
      "'B: Pepperoni' has a normalized query similarity of 0.5469\n"
     ]
    }
   ],
   "source": [
    "unit_title_vectors = [unit_title_a_vector, unit_title_b_vector]\n",
    "similarities = [tanimoto_similarity(unit_query_vector, unit_title_vector)\n",
    "                for unit_title_vector in unit_title_vectors]\n",
    "\n",
    "for index in sorted(range(len(titles)), key=lambda i: similarities[i], \n",
    "                    reverse=True):\n",
    "    title = titles[index]\n",
    "    similarity = similarities[index]\n",
    "    print(f\"'{title}' has a normalized query similarity of {similarity:.4f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Vector normalization has fixed a flaw in our search engine. The search engine is no longer overly sensitive to title-length. Also, in the process, we have inadvertently made our Tanimoto computation more efficient. Given two normalized unit vectors `u1` and `u2`, their Tanimoto similarity reduces to `u1 @ u2 / (2 - u1 @ u2)`. Taking the dot product of each vector with itself is no longer necessary. The only required vector computation is `u1 @ u2`. \n",
    "\n",
    "**Listing 13. 30. Computing a unit-vector Tanimoto similarity**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "def normalized_tanimoto(u1, u2):\n",
    "    dot_product = u1 @ u2\n",
    "    return dot_product / (2 - dot_product)\n",
    "\n",
    "for unit_title_vector in unit_title_vectors[1:]:\n",
    "    similarity = normalized_tanimoto(unit_query_vector, unit_title_vector)\n",
    "    assert similarity == tanimoto_similarity(unit_query_vector, \n",
    "                                             unit_title_vector)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The dot product of 2 unit-vectors is a very special value. It can easily be converted into the angle between the vectors, and also into the spatial distance between them. \n",
    "\n",
    "#### Utilizing Unit-Vector Dot Products to Convert Between Relevance Metrics\n",
    "The unit-vector dot product unites multiple types of comparison metrics. We've just seen how `tanimoto_similarity(u1, u2)` is a direct function of `u1 @ u2`. As it turns out, the Euclidian distance between unit-vectors is also a function of `u1 @ u2`. Its not difficult prove that `euclidian(u1, u2)` equals `+(2 - 2* u1 @ u2) ** 0.5+`. Additionally, `u1 @ u2` equals the cosine of the angle between the unit-vectors. Thus `u1 @ u2` is commonly referred to as the **cosine similarity**.\n",
    "\n",
    "The code below illustrates how easy it is to convert between the Tanimoto similarity, the cosine similarity, and the Euclidean distance.\n",
    "\n",
    "**Listing 13. 31. Converting between unit-vector metrics**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "We are comparing Normalized Query Vector and Normalized Title A vector\n",
      "The Tanimoto similarity between vectors is 1.0000\n",
      "The cosine similarity between vectors is 1.0000\n",
      "The Euclidean distance between vectors is 0.0000\n",
      "The angle between vectors is 0.0000 degrees\n",
      "\n",
      "We are comparing Normalized Query Vector and Title B Vector\n",
      "The Tanimoto similarity between vectors is 0.5469\n",
      "The cosine similarity between vectors is 0.7071\n",
      "The Euclidean distance between vectors is 0.7654\n",
      "The angle between vectors is 45.0000 degrees\n",
      "\n"
     ]
    }
   ],
   "source": [
    "unit_vector_names = ['Normalized Title A vector', 'Title B Vector']\n",
    "u1 = unit_query_vector\n",
    "\n",
    "for unit_vector_name, u2 in zip(unit_vector_names, unit_title_vectors):\n",
    "    similarity = normalized_tanimoto(u1, u2)\n",
    "    cosine_similarity  = 2 * similarity / (1 + similarity)\n",
    "    assert cosine_similarity == u1 @ u2\n",
    "    angle = np.arccos(cosine_similarity)\n",
    "    euclidean_distance = (2 - 2 * cosine_similarity) ** 0.5\n",
    "    assert round(euclidean_distance, 10) == round(euclidean(u1, u2), 10)\n",
    "    measurements = {'Tanimoto similarity': similarity,\n",
    "                    'cosine similarity': cosine_similarity,\n",
    "                    'Euclidean distance': euclidean_distance,\n",
    "                    'angle': np.degrees(angle)}\n",
    "    \n",
    "    print(\"We are comparing Normalized Query Vector and \"\n",
    "           f\"{unit_vector_name}\")\n",
    "    for measurement_type, value in measurements.items():\n",
    "        output = f\"The {measurement_type} between vectors is {value:.4f}\"\n",
    "        if measurement_type == 'angle':\n",
    "            output += ' degrees\\n'\n",
    "        \n",
    "        print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Vector normalization allows us to swap between multiple comparison metrics. Other benefits of normalization include a more efficient computation of the similarity between every pair of vectors. This called the **all-by-all similarity**.  The all-by-all similarity can be elegantly computed using **matrix multiplication**. In mathematics, matrix multiplication generalizes the dot product from 1-dimensional vectors to 2-dimensional arrays. The generalized dot product leads to the efficient computation of similarities across all pairs of texts.\n",
    "\n",
    "## 13.3. Matrix Multiplication for Efficient Similarity Calculation\n",
    "\n",
    "When analyzing our _seashell_-centric texts, we compared each text-pair individually. What if instead, we visualized all pairwise similarities? The rows and columns of the table would correspond to individual texts. The table would provide us with a bird's-eye view of all the relationship between texts. We would finally learn whether `text2` is more similar to `text1` or to `text3`.\n",
    "\n",
    "Lets generate a table of normalized Tanimoto similarities.\n",
    "\n",
    "**Listing 13. 32. Computing a table of normalized Tanimoto similarities**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXkAAAD4CAYAAAAJmJb0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAfuUlEQVR4nO3deZQV1bn38e/Tp0FbkKEbaFAaArFVEAxEgzihl0FQ4nzxRbMM0RDCq3CNvomYexURlhOoxCjYEoUQDRoZbsCgEsUgUSODEURRAigyKFM3gwwidD/vH110To/ndNOc06f4fVy1PLtqV9VzarGe3mfXrl3m7oiISDilJTsAERE5epTkRURCTEleRCTElORFREJMSV5EJMTSkx1ATWW0uV7Dgo6isa/dlOwQjgnnND+Y7BBCr1vz/nakx6hOvtm//oUjPl9tUkteRCTEUrYlLyKSKGap2x5WkhcRiSHNUjdVpm7kIiIJopa8iEiImdWpe6nVoiQvIhKTWvIiIqGl7hoRkRBTkhcRCTGNrhERCTG15EVEQkxJXkQkxAwNoRQRCS215EVEQiwtLXVTZepGLiKSMGrJi4iElrprRERCTEleRCTETN01IiLhpZa8iEiIpaVFkh1CjSnJi4jEoO4aEZEQU3eNiEiIKcmLiISYumtERELMNK2BiEh4pfKLvFP3N4iISIIYaXEvcR3PrJ+ZrTKzNWZ2VwXb25rZfDP70MwWmFnrqG2DzGx1sAyKdS4leRGRGMzS4l5iH8siwATgUqAjcL2ZdSxT7RHgD+5+JjAaeDDYNxO4FzgH6Abca2ZNqzqfkryISCxm8S+xdQPWuPtn7v4t8CJwZZk6HYE3g89/i9reF3jd3QvcfQfwOtCvqpMpyYuIxJIW/2JmQ8xsadQypMzRTgY2RJU3BuuiLQeuCT5fDZxoZllx7luKbryKiMSSFn972N0nAZOO8Iy/BJ40s58AC4FNQGFNDqQkf5Tljfs5l/bqyrb83Zzd585kh5OyvvjnSv7+7Ey8qIiOvc/lrGsvKbX9kzff452ps2mY2RiAzpf14Iw+5wEwZ/RENq9aR6sO7bn87qEJjz1VfPjeJzz3+J8pKiri4h925/Ibe1VYb8mC5fz27qnc98zttD89p2T99s07uOvGh7n6pr70v+E/EhV2YtRun8cmICeq3DpYV8LdvyRoyZtZQ+Bad99pZpuAi8vsu6Cqk1WZ5IOfB/ODYkuK/5JsC8rdgv6kmIKbBde5e14l26cClwGb3L1LPMdMFc9Nf4u8qfN4ZvwtyQ4lZRUVFvHWpOlcOepWGmY14aU7x9GuW2cyc1qVqpd7flcuGnJduf27XtWLQwe+5aN57yQq5JRTVFjE1MdmMWL8UDJbNGbk4PF8/4IzOLldy1L19u/7hnnT/853O7Ypd4xpT87mzHM6JCrkhPLaHUK5BMg1s3YUJ/eBwA3RFcysGVDg7kXAr4HJwaZ5wANRN1svCbZXqsq/T+6e7+5dgsSbB4w/XI43wQcygaqaUJOB/tU4Xsp4Z/GnFOzck+wwUtqW1V/QuFUzGrdsRqReOrkXnMVni1fEvX/OmadRL+P4oxhh6lv7yXqyWzejxclZpNdLp3vvrrz/9kfl6s383av88Ec9qVe/Xqn1SxeuoHmrTFq3y05UyIll1VhicPdDwDCKE/YnwEvu/rGZjTazK4JqFwOrzOxfQDZwf7BvATCG4j8US4DRwbpK1fhHSDBWc7GZLTOziWaWZmbtgrGbmWYWMbN3zawn8BBwWlD3oQq+9FtAlYHKsWtvwU5ObPbvUWINs5qwN39nuXpr31vOC794kFfHPsvX23ckMsSUt2PbLjJbNCkpZzZvwo5tu0rVWbdqI/lbd9LlvNKj/b7Zd4C5f3yTq2/qm5BYkyLN4l/i4O6vuPup7v5ddz+cwEe6+5zg8wx3zw3qDHb3A1H7Tnb3U4JlSqxz1ahP3sw6UXzH9zx3P2Rmk4CB7j7NzB4FJlJ8d/gDd3/TzNYDp4StK0bqju+c3ZlTLzyLSL16fDTvbd54/DmuHvNfyQ4rNIqKivjjE7MZ8j/Xl9s2a/I8+l13EcefcFwSIkuQFH7itaY3XnsDPwCWBo/7ZhAM63H3PDMbANwEdK2NIA8LhiINAUhvejbpDU+pzcNLHdUgs0mplvme/J00yGpSqk5GowYlnzv2Po93/zA7YfGFQdPmjSnY+u9fRwXbdtK0eeOS8jf7DrDx8808MHwCALsKvmb8iGe5/eGfsnblFyxZsJwXn3qZfXv2Y2bUPy6dPtdemPDvcdREjr0kb8Bkd7+n3IbiO8EnARGgIbC35uGVFj00KaPN9V5bx5W6LTu3Dbu+2sbuLdtpkNmE1W+/zyW3/6RUnb0Fu2gQjKz5fMkKmrZuWcGRpDLtT89h84ZtbP0yn8zmjXnvjQ+45d4bS7af0DCDp+aOKSnfP2wC1w+7gvan53DPxOEl62c9+xrHZRwXrgQPx2RL/g1ghpk97u7bg1E4Ddx9PTAOmAJsAZ4GrgK+Bk6sjYBTzdQnhnPhuR1o1vRE1ix6kjGPzWDqnxYkO6yUkhaJ0ONnA5h930S8yOnYqztZbVqxaNpcWpzShnbdOrN87lusW7ICi6RxfMMG9B7+o5L9Z/73eHZs2srBbw4wZfA99Lz1Btp2DecokJqKpEf48R3XMO6OSRQVFdGjfzdat2/JzGdepd3pOXz/gk7JDjG5UjfHY+7xNYjNbBSwx90fCco3AHdSfPP2IMWjZxpRfOf3QncvNLM5wHR3f87MXgI6AHPd/a4yx54OXABkAVuBu93991XFo5b80TX2tZuSHcIx4ZzmB5MdQuh1a97/iFN0br/Jceeb1a/dXKf+JMTdknf3UWXK04BpFVSdH1XniqjP5Qcw/3vbgHjjEBFJuDqVtqtHT7yKiMTgkdSd5ktJXkQkFrXkRURC7BgcXSMicuyI80nWukhJXkQkltTN8UryIiIxqbtGRCTEjsFpDUREjh1qyYuIhFjq5ngleRGRWFyja0REQkzdNSIiIZa6OV5JXkQkJs1dIyISYmrJi4iEmG68ioiEmJK8iEh4eermeCV5EZGYdONVRCTE1F0jIhJiqduQV5IXEYlJT7yKiISYumtERMLL1ZIXEQmxdCV5EZHwUkteRCTE1CcvIhJiqZvjleRFRGLRm6FERMIshZN8Cj/HJSKSIBGLf4mDmfUzs1VmtsbM7qpg+3gzWxYs/zKznVHbCqO2zYl1rpRtyY997aZkhxBqd/abkuwQjgn719+X7BAkHrU4usbMIsAEoA+wEVhiZnPcfeXhOu5+e1T94UDXqEPsd/cu8Z5PLXkRkVjSLP4ltm7AGnf/zN2/BV4Erqyi/vXACzUOvaY7iogcM6qR5M1siJktjVqGlDnaycCGqPLGYF05ZtYWaAe8GbX6+OC475nZVbFCT9nuGhGRRKnOtAbuPgmYVEunHgjMcPfCqHVt3X2TmbUH3jSzFe6+trIDqCUvIhJL7d543QTkRJVbB+sqMpAyXTXuvin4/2fAAkr315ejJC8iEkvt9skvAXLNrJ2Z1ac4kZcbJWNmpwNNgX9ErWtqZscFn5sB5wMry+4bTd01IiKx1OI4eXc/ZGbDgHlABJjs7h+b2WhgqbsfTvgDgRfd3aN27wA8bWZFFDfSH4oelVMRJXkRkVhq+Vkod38FeKXMupFlyqMq2O9doHN1zqUkLyISg6Y1EBEJM001LCISYnFOV1AXKcmLiMSQlsLjEJXkRURiSOHeGiV5EZFYlORFRELMUjjLK8mLiMSgPnkRkRAzJXkRkfBK4d4aJXkRkVhS+IFXJXkRkVjUkhcRCTEleRGREEvTtAYiIuGllryISIgpyYuIhJiSvIhIiGkIpYhIiKklLyISYhpdIyISYmrJi4iEmJK8iEiIKcmLiISYRteIiIRYWiTZEdScknwt+OKfK/n7szPxoiI69j6Xs669pNT2T958j3emzqZhZmMAOl/WgzP6nAfAnNET2bxqHa06tOfyu4cmPPYwyBv3cy7t1ZVt+bs5u8+dyQ4nZS1c+D733/87ioqKGDCgD0OGDCi1fcqUPzN9+l+JRCJkZjbigQdu4+STWwDw5ZdbufvuJ/jqq+2YGZMm3Uvr1tnJ+BpHRWi7a8wsC5gfFFsChcC2oNzN3b+N5yRmlglc5+55FWxrC0wFWgAOPOXuT8YXfvIVFRbx1qTpXDnqVhpmNeGlO8fRrltnMnNalaqXe35XLhpyXbn9u17Vi0MHvuWjee8kKuTQeW76W+RNnccz429Jdigpq7CwkNGj85gyZQzZ2Vn853/eQc+e53DKKW1K6nTo0J6ZMx8jI+N4pk17hXHjpvCb34wAYMSI8Qwdeh3nn9+VvXv3k5bK/RsVSOV3vFb5Uit3z3f3Lu7eBcgDxh8ux5vgA5lAZc3Ug8Av3L0jcC5wu5mdWo1jJ9WW1V/QuFUzGrdsRqReOrkXnMVni1fEvX/OmadRL+P4oxhh+L2z+FMKdu5Jdhgp7cMPV9O2bStyclpSv349+vfvwfz5i0rV6d79TDKCf6tdupzG5s35AKxZs55Dhwo5//yuADRokFFSLyzM4l/qmhq/udDMBpnZYjNbZmYTzSzNzNqZ2WozyzSziJm9a2Y9gYeA04K6D0Ufx92/dPdlwefdwKfAyUfypRJpb8FOTmzWtKTcMKsJe/N3lqu39r3lvPCLB3l17LN8vX1HIkMUiWnLlnxatmxWUs7OzmLLlvxK68+Y8To9epwFwLp1m2jUqAHDhj3AVVfdxsMPT6awsPCox5xIx1ySN7NOwNXAeUErPx0Y6O6fA48CE4E7gQ/c/U3gLmBV8AvgriqO2x7oBCypZPsQM1tqZkvfeemVmoSeFN85uzODnh7F9b/5NTnfO403Hn8u2SGJ1Njs2X/jo4/WMHjwNQAcOlTE0qUrGTHiZmbMeIyNGzcza9b8GEdJLamc5Gt647U38ANgadBXlQFsAHD3PDMbANwEdI33gGbWCJgJDHf3Cn97u/skYBLAEyv/6jWMvVY1yGxSqmW+J38nDbKalKqT0ahByeeOvc/j3T/MTlh8IvHIzs5i8+btJeUtW/LJzs4qV+/dd5eRl/cSzz//IPXr1wOgZcssOnRoR05OSwB69erO8uWrEhN4gqTXuM8j+WoaugGTo/rnT3P3MQBm1hA4CYgADeM6mFl9YBYwxd3n1DCmpMjObcOur7axe8t2Cg8eYvXb79PuB51L1dlbsKvk8+dLVtC0dctEhylSpc6dc1m37ks2bNjMt98eZO7chfTs2a1UnZUr1zJy5ASeeuoesqIaMp0757J7914Kgn/nixZ9WOqGbRikmce91DU1bcm/Acwws8fdfXswCqeBu68HxgFTgC3A08BVwNfAiRUdyIp/CvweWObuv61hPEmTFonQ42cDmH3fRLzI6dirO1ltWrFo2lxanNKGdt06s3zuW6xbsgKLpHF8wwb0Hv6jkv1n/vd4dmzaysFvDjBl8D30vPUG2nbtkMRvlHqmPjGcC8/tQLOmJ7Jm0ZOMeWwGU/+0INlhpZT09AgjRw5l8OB7KSws4tpre5Ob25bHH3+eTp1y6dXrHMaOncK+fd9w223Ft9VatWpOXt49RCIRRoy4mUGD7gacM874LgMGXFL1CVNMKg8WMvf4/vKY2Shgj7s/EpRvoLjfPY3iETJDgUbAGOBCdy80sznAdHd/zsxeAjoAc6P75c3sYuBvwIcUD6EEGOHu86qKp65014TVnf2mJDuEY8L+9fclO4RjwKlHnKL7//XtuPPN3EsuqFN/EuJuybv7qDLlacC0CqrOj6pzRdTn8oPEi9cvoLj7R0SkTqqL3TDxSuHbCSIiiZFm8S/xMLN+ZrbKzNaYWYUjDs3sOjNbaWYfm9m0qPWDgqHqq81sUKxzaVoDEZEY0muxr8HMIsAEoA+wEVhiZnPcfWVUnVzg18D57r7DzFoE6zOBe4GzKe7efj/Yt9KHb9SSFxGJwczjXuLQDVjj7p8FMwe8CFxZps7PgAmHk7e7bw3W9wVed/eCYNvrQL+qTqYkLyISQy1315xM8FxRYCPln/I/FTjVzN4xs/fMrF819i1F3TUiIjFUpzVsZkOAIVGrJgUPclZHOpALXAy0BhaaWecq96jiQCIiUoXqjK6JfjK/EpuAnKhy62BdtI3AInc/CHxuZv+iOOlvojjxR++7oKp41F0jIhJDusW/xGEJkBtM6FgfGAiUfdL/zwTJ3MyaUdx98xkwD7jEzJqaWVPgkmBd5bFX43uKiByTavOJV3c/ZGbDKE7OEYqniPnYzEYDS4OpXQ4n85UUv8fjV+6eD2BmY/j3JI6j3b2gqvMpyYuIxFDbD0O5+yvAK2XWjYz67MAdwVJ238nA5HjPpSQvIhJDKs9doyQvIhJDKt+8VJIXEYkhleeuUZIXEYkhlV8aoiQvIhJDCud4JXkRkVjUXSMiEmIaXSMiEmLqrhERCTG15EVEQiySpj55EZHQUneNiEiIaXSNiEiIqU9eRCTElORFREKsnrprRETCSy15EZEQU5IXEQmxiJK8iEh4qSUvIhJiGicvIhJi9dSST7xzmh9Mdgihtn/9fckO4ZiQ0ebeZIcQevvXv3DEx1B3jYhIiKm7RkQkxDS6RkQkxNRdIyISYukpPNewkryISAwR9cmLiIRXCjfkleRFRGJRn7yISIgpyYuIhJj65EVEQkyja0REQkzdNSIiIaYnXkVEQkxz14iIhFgKd8mndOwiIgmRZvEv8TCzfma2yszWmNldVdS71szczM4Oyt8xs/1mtixY8mKdSy15EZEY6qXVXneNmUWACUAfYCOwxMzmuPvKMvVOBG4DFpU5xFp37xLv+dSSFxGJoZZb8t2ANe7+mbt/C7wIXFlBvTHAw8A3RxT7kewsInIsqE6SN7MhZrY0ahlS5nAnAxuiyhuDdSXM7PtAjrvPrSCcdmb2gZm9ZWYXxopd3TUiIjFUpzXs7pOASTU9l5mlAY8BP6lg81dAG3fPN7OzgD+b2Rnuvruy46klLyISg1n8Sxw2ATlR5dbBusNOBDoBC8xsHdAdmGNmZ7v7AXfPB3D394G1wKlVnUwteRGRGGr5idclQK6ZtaM4uQ8Ebji80d13Ac0Ol81sAfBLd19qZs2BAncvNLP2QC7wWVUnU5IXEYmhNrs83P2QmQ0D5gERYLK7f2xmo4Gl7j6nit17AKPN7CBQBAx194KqzqckLyISg9XyE6/u/grwSpl1Iyupe3HU55nAzOqcS0leRCSGFJ66RkleRCSWOG+o1klK8iIiMaRwjleSFxGJRVMNi4iEmLprRERCLIVzvJK8iEgsSvIiIiGWyu941dw1teDD9z7hV9c/yP/7P/fz8nPzK623ZMFybrzgDj77dEOp9ds372Bwn7uYO+1vRzvUlLVw4fv07TuUPn2GMGnS9HLbp0z5M5dddguXXz6cQYP+h02btpZs+/LLrdx88z1ceun/5bLLbmHjxi2JDD008sb9nC/+mcfS18cmO5SEs2osdU2VSd7MsqLeQLLZzDZFlevHexIzyzSzoZVsa2Bmi4NjrjSzCp/6qquKCouY+tgsfvXIEB5+fgT/eOOfbPp8c7l6+/d9w7zpf+e7HduU2zbtydmceU6HRISbkgoLCxk9Oo9nnhnF3LkT+MtfFrJmzfpSdTp0aM/MmY/x8stP0Lfv+YwbN6Vk24gR4/npT6/h1VefYvr0R8nKapzorxAKz01/iyt//FCyw0iKNPO4l7qmyiTv7vnu3iV4C0keMP5wOZjsPl6ZQIVJHtgP/Edwju8BVxx+1VUqWPvJerJbN6PFyVmk10une++uvP/2R+Xqzfzdq/zwRz2pV79eqfVLF66geatMWrfLTlTIKefDD1fTtm0rcnJaUr9+Pfr378H8+aVfltO9+5lkZBwPQJcup7F5cz4Aa9as59ChQs4/vysADRpklNST6nln8acU7NyT7DCSopZnoUyoGnfXmNmgqBb4RDNLM7N2ZrY6aLlHzOxdM+sJPAScFtQt1RRw9yJ33xsU6wP1gLr357ASO7btIrNFk5JyZvMm7Ni2q1Sddas2kr91J13O61hq/Tf7DjD3j29y9U19ExJrqtqyJZ+WLUsm5SM7O4stW/IrrT9jxuv06HEWAOvWbaJRowYMG/YAV111Gw8/PJnCwsKjHrOES1o1lrqmRjGZWSfgauC8oAWeDgx098+BR4GJwJ3AB+7+JnAXsCr4BVDupbVmVt/MlgFbgL8E8ySHQlFREX98YjY3DCv/dq9Zk+fR77qLOP6E45IQWTjNnv03PvpoDYMHXwPAoUNFLF26khEjbmbGjMfYuHEzs2ZVft9EpCKp3JKv6eia3sAPgKVW/K0yCF5n5e55ZjYAuAnoGs/Bgq6fLmbWFPhfM+vg7p+UrRe8RmsIwF2PDOPqH/erYfi1p2nzxhRs3VlSLti2k6bN/93n+82+A2z8fDMPDJ8AwK6Crxk/4lluf/inrF35BUsWLOfFp15m3579mBn1j0unz7Ux3+h1TMnOzmLz5u0l5S1b8snOzipX7913l5GX9xLPP/8g9YNusZYts+jQoR05OS0B6NWrO8uXr0pM4BIadTB3x62mSd4ongP5nnIbzBoCJ1E8T3JDYG/ZOpVx9x1mthDoC5RL8tGv1Vq8bW6d6NJpf3oOmzdsY+uX+WQ2b8x7b3zALffeWLL9hIYZPDV3TEn5/mETuH7YFbQ/PYd7Jg4vWT/r2dc4LuM4JfgKdO6cy7p1X7Jhw2ays7OYO3chjz76y1J1Vq5cy8iRE3jmmfvIympSat/du/dSULCLzMzGLFr0IZ065Sb6K0iKS+UhlDVN8m8AM8zscXffbmZZQAN3Xw+MA6ZQ3PXyNHAV8DXFr7Qqx8xaAAfcfZeZnUDxr4TRNYwr4SLpEX58xzWMu2MSRUVF9OjfjdbtWzLzmVdpd3oO37+gU7JDTHnp6RFGjhzK4MH3UlhYxLXX9iY3ty2PP/48nTrl0qvXOYwdO4V9+77httuKb/m0atWcvLx7iEQijBhxM4MG3Q04Z5zxXQYMuCS5XyhFTX1iOBee24FmTU9kzaInGfPYDKb+aUGyw0qIVE7y5h5fg9jMRgF73P2RoHwDxf3uacBBikfPNALGABcGr6eaA0x39+fM7CWgAzA3ul/ezLoAv6f410EEeMHd748VT11pyYdVt+Zq7SZCRpt7kx1C6O1f/8IRp+iv9r0cd75pdcLldepPQtwteXcfVaY8DZhWQdX5UXWuiPp8XSXHXQZ0iTcOEZFEq+03QyWSpjUQEYmhTjXNq0lJXkQkhro4NDJeSvIiIjFEkh3AEVCSFxGJQS15EZFQS90sryQvIhKDKcmLiISXWV2ceiw+SvIiIjGpJS8iElpWJycRjo+SvIhIDOquEREJNXXXiIiElkbXiIiEmJK8iEiImaXuxAZK8iIiMaklLyISWuquEREJtdQdQpm6kYuIJIhV47+4jmfWz8xWmdkaM7urgu1DzWyFmS0zs7fNrGPUtl8H+60ys76xzqWWvIhIDFaLcw1b8V3cCUAfYCOwxMzmuPvKqGrT3D0vqH8F8BjQL0j2A4EzgJOAN8zsVHcvrOx8asmLiMRgROJe4tANWOPun7n7t8CLwJXRFdx9d1SxAXD4JbNXAi+6+wF3/xxYExyvUkryIiIxWdyLmQ0xs6VRy5AyBzsZ2BBV3hisK31Gs1vNbC0wFviv6uwbTd01IiIxVKe7xt0nAZOO9JzuPgGYYGY3AHcDg2pyHLXkRURiir8lH4dNQE5UuXWwrjIvAlfVcF8leRGRWIy0uJc4LAFyzaydmdWn+EbqnFLnM8uNKvYHVgef5wADzew4M2sH5AKLqzqZumtERGKqvdE17n7IzIYB84AIMNndPzaz0cBSd58DDDOz3sBBYAdBV01Q7yVgJXAIuLWqkTWgJC8iElNaLc8n7+6vAK+UWTcy6vNtVex7P3B/vOdSkhcRiSl1e7aV5EVEYtDcNSIioaYkLyISWrU5rUGiKcmLiMQQ53QFdZK5e+xacsTMbEjwJJwcJbrGiaHrnFpS95Zx6ik7f4XUPl3jxNB1TiFK8iIiIaYkLyISYkryiaM+zKNP1zgxdJ1TiG68ioiEmFryIiIhpiQvIhJiSvJxMrOs4M3py8xss5ltiirXr8ZxMs1saBXbp5rZNjNbVjuRp5ZEXGcza2tmC8xspZl9HEz7esxI0DVuYGaLg2OuNLORFdWTo0998jVgZqOAPe7+SA32PQWY4e5dKtl+EbAfmFRZnWPF0brOZnYS0MLdl5lZI+AD4FJ3/9eRxpxqjuI1TgMy3H2vmdUD/gEMdfelRxqzVI9a8rXAzAZFtVommlla8NaX1UFrJ2Jm75pZT+Ah4LSg7kNlj+XubwEFCf8SKaC2rrO7f+nuy4LPu4FPifEy5GNFLV7jInffGxTrA/UAtSiTQHPXHCEz6wRcDZwXvPFlEjDQ3aeZ2aPARGA58IG7v2lm64FTjvVWenUdretsZu2BThS/ku2YVtvXOOj6WQycAjzu7u8n5ptINCX5I9cb+AGwNJipLgPYAODueWY2ALgJ6Jq0CMOh1q9z0FUzExju7ntqPeLUU6vX2N2/BbqYWVPgf82sg7t/clQil0opyR85o/gdjfeU22DWEDiJ4vc4NgT2lq0jcavV6xy0MmcBU4J3aspR+rfs7jvMbCHQF1CSTzD1yR+5N4DrzKwZlIxcaBNsGwdMAUYDTwfrvgZOTHiUqa/WrrMVN1N/Dyxz998ezaBTTG1e4xZm1jj4fALFvxI+PYqxSyWU5I+Qu68A7gPeMLMPgb8C2WbWC/ge8Ki7TwXSzOxGd98CvG9mKyq68Wpm04G/Ax3NbKOZ/SRhX6YOq+XrfBFwPdAnauhg3wR+nTqplq/xScBbZrac4n75ue7+WuK+jRymIZQiIiGmlryISIgpyYuIhJiSvIhIiCnJi4iEmJK8iEiIKcmLiISYkryISIj9f471KLrXO1ygAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "num_texts = len(tf_vectors)\n",
    "similarities = np.array([[0.0] * num_texts for _ in range(num_texts)])\n",
    "similarities = np.zeros((num_texts, num_texts))\n",
    "unit_vectors = np.array([vector / norm(vector) for vector in tf_vectors])\n",
    "for i, vector_a in enumerate(unit_vectors):\n",
    "    for j, vector_b in enumerate(unit_vectors):\n",
    "        similarities[i][j] = normalized_tanimoto(vector_a, vector_b)\n",
    "        \n",
    "labels = ['Text 1', 'Text 2', 'Text 3']\n",
    "sns.heatmap(similarities,  cmap='YlGnBu', annot=True, \n",
    "            xticklabels=labels, yticklabels=labels)\n",
    "plt.yticks(rotation=0)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Looking at the table is informative. We can immediately tell which text-pairs share the highest similarity, and which texts share the lowest similarity. However, our table computation relied on inefficient code. We can purge our code of these operations using matrix multiplication. However, we'll first need to introduce basic matrix operations.\n",
    "\n",
    "### 1.3.1. Basic Matrix Operations\n",
    "\n",
    "A **matrix** is the extention of a 1-dimensional vector to 2 dimensions. In other words, a matrix is just table of numbers. Since matrices are tables, they can be analyzed using Pandas. Conversely, numeric tables can be handled using 2D NumPy arrays. Both matrix representations are valid. In fact, Pandas DataFrames and NumPy arrays can sometimes be used interchangeably. This is because DataFrames and arrays share certain attributes.\n",
    "\n",
    "**Listing 13. 33. Comparing Pandas and NumPy matrix attributes**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Our Pandas DataFrame contains 3 rows and 15 columns\n",
      "Our 2D NumPy array contains 3 rows and 15 columns\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "matrices = [unit_vectors, pd.DataFrame(unit_vectors)]\n",
    "matrix_types = ['Pandas DataFrame', '2D NumPy array']\n",
    "\n",
    "for matrix_type, matrix in zip(matrix_types, matrices):\n",
    "    row_count, column_count = matrix.shape\n",
    "    print(f\"Our {matrix_type} contains \"\n",
    "          f\"{row_count} rows and {column_count} columns\")\n",
    "    assert (column_count, row_count) == matrix.T.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas and NumPy table structures are similar. Nonetheless, there are certain benefits to storing matrices within 2D NumPy arrays. One immediate benefit is NumPy's integration of Python's built-in arithmetic operators.\n",
    "\n",
    "#### NumPy Matrix Arithmetic Operations\n",
    "\n",
    "Doubling the values of a matrix is very easy to do in NumPy. For example, we can double our `similarities` matrix by running `2 * similarities`. We can also add `similarities` directly to itself, by running `similarities + similarities`. Of course, the 2 arithmetic outputs will equal. Meanwhile, running `similarities - similarities` will return a matrix of zeros. Furthermore, running `similarities - similarities - 1` will subtract 1 from each of the zeros. \n",
    "\n",
    "**Listing 13. 34. NumPy array addition and subtraction**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "double_similarites = 2 * similarities\n",
    "np.array_equal(double_similarites, similarities + similarities)\n",
    "zero_matrix = similarities - similarities\n",
    "negative_1_matrix = similarities - similarities - 1\n",
    "\n",
    "for i in range(similarities.shape[0]):\n",
    "    for j in range(similarities.shape[1]):\n",
    "        assert double_similarites[i][j] == 2 * similarities[i][j]\n",
    "        assert zero_matrix[i][j] == 0\n",
    "        assert negative_1_matrix[i][j] == -1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In that same manner, we can multiply and divide NumPy arrays. Running `similarities / similarities` will divide each similarity by itself, thus returning a matrix of ones. Meanwhile, running `similarities * similarities` will return a matrix of squared similarity values.\n",
    "\n",
    "**Listing 13. 35. NumPy array multiplication and division**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "squared_similarities = similarities * similarities\n",
    "assert np.array_equal(squared_similarities, similarities ** 2)\n",
    "ones_matrix = similarities / similarities\n",
    "\n",
    "for i in range(similarities.shape[0]):\n",
    "    for j in range(similarities.shape[1]):\n",
    "        assert squared_similarities[i][j] == similarities[i][j] ** 2\n",
    "        assert ones_matrix[i][j] == 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Matrix arithmetic lets us conveniently transition between similarity-matrix types.  For instance, we can convert our Tanimoto matrix into a cosine similarity matrix, simply by running `2 * similarities / (1 + similarities)`.\n",
    "\n",
    "**Listing 13. 36. Converting between matrix similarity-types**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
    "cosine_similarities  = 2 * similarities / (1 + similarities)\n",
    "for i in range(similarities.shape[0]):\n",
    "    for j in range(similarities.shape[1]):\n",
    "        cosine_sim = unit_vectors[i] @ unit_vectors[j]\n",
    "        assert round(cosine_similarities[i][j], \n",
    "                     15) == round(cosine_sim, 15)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "NumPy 2D arrays confer additional benefits over Pandas. Accessing rows and columns by index is much more straightforward in NumPy. \n",
    "\n",
    "#### NumPy Matrix Row and Column Operations\n",
    "\n",
    "Given any 2D `matrix` array, we can access the row at index `i` by running `matrix[i]`. Likewise, we can access the column at index `j` by running `matrix[:,j]`.  \n",
    "\n",
    "Lets use NumPy indexing to print the second row and column of both `unit_vectors` and `similarities`.\n",
    "\n",
    "**Listing 13. 37. Accessing NumPy matrix rows and columns**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Accessing rows and columns in the Similarities Matrix.\n",
      "Row at index 1 is:\n",
      "[1.         0.51442439 0.44452044]\n",
      "\n",
      "Column at index 1 is:\n",
      "[1.         0.51442439 0.44452044]\n",
      "\n",
      "Accessing rows and columns in the Unit Vectors Matrix.\n",
      "Row at index 1 is:\n",
      "[0.         0.40824829 0.40824829 0.40824829 0.40824829 0.\n",
      " 0.         0.         0.         0.40824829 0.         0.40824829\n",
      " 0.         0.         0.        ]\n",
      "\n",
      "Column at index 1 is:\n",
      "[0.         0.         0.30151134]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for name, matrix in [('Similarities', similarities),\n",
    "                     ('Unit Vectors', unit_vectors)]:\n",
    "    print(f\"Accessing rows and columns in the {name} Matrix.\")\n",
    "    row, column = matrix[0], matrix[:,0]\n",
    "    print(f\"Row at index 1 is:\\n{row}\")\n",
    "    print(f\"\\nColumn at index 1 is:\\n{column}\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All printed rows and columns are 1-dimensional NumPy arrays. Given two arrays, we can compute their dot product, but only if the array lengths are the same. In our output, both `similarities[0].size` and `unit_vectors[:,0].size` are equal to 3. Hence, we can take the dot product between the first row of `similarities` and the first column of `unit_vectors`.\n",
    "\n",
    "**Listing 13. 38. Computing the dot product between a row and column**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The dot product between row 0 column 0 is: 0.1340\n",
      "The dot product between row 0 column 1 is: 0.5509\n",
      "The dot product between row 0 column 2 is: 0.5423\n",
      "The dot product between row 0 column 3 is: 0.8276\n",
      "The dot product between row 0 column 4 is: 0.6850\n",
      "The dot product between row 0 column 5 is: 0.1340\n",
      "The dot product between row 0 column 6 is: 0.1340\n",
      "The dot product between row 0 column 7 is: 0.1340\n",
      "The dot product between row 0 column 8 is: 0.1340\n",
      "The dot product between row 0 column 9 is: 0.5423\n",
      "The dot product between row 0 column 10 is: 0.1427\n",
      "The dot product between row 0 column 11 is: 0.8276\n",
      "The dot product between row 0 column 12 is: 0.1427\n",
      "The dot product between row 0 column 13 is: 0.1340\n",
      "The dot product between row 0 column 14 is: 0.1427\n",
      "The dot product between row 1 column 0 is: 0.0797\n",
      "The dot product between row 1 column 1 is: 0.4874\n",
      "The dot product between row 1 column 2 is: 0.2897\n",
      "The dot product between row 1 column 3 is: 0.8444\n",
      "The dot product between row 1 column 4 is: 0.5671\n",
      "The dot product between row 1 column 5 is: 0.0797\n",
      "The dot product between row 1 column 6 is: 0.0797\n",
      "The dot product between row 1 column 7 is: 0.0797\n",
      "The dot product between row 1 column 8 is: 0.0797\n",
      "The dot product between row 1 column 9 is: 0.2897\n",
      "The dot product between row 1 column 10 is: 0.2774\n",
      "The dot product between row 1 column 11 is: 0.8444\n",
      "The dot product between row 1 column 12 is: 0.2774\n",
      "The dot product between row 1 column 13 is: 0.0797\n",
      "The dot product between row 1 column 14 is: 0.2774\n",
      "The dot product between row 2 column 0 is: 0.3015\n",
      "The dot product between row 2 column 1 is: 0.2548\n",
      "The dot product between row 2 column 2 is: 0.4830\n",
      "The dot product between row 2 column 3 is: 0.6296\n",
      "The dot product between row 2 column 4 is: 0.5563\n",
      "The dot product between row 2 column 5 is: 0.3015\n",
      "The dot product between row 2 column 6 is: 0.3015\n",
      "The dot product between row 2 column 7 is: 0.3015\n",
      "The dot product between row 2 column 8 is: 0.3015\n",
      "The dot product between row 2 column 9 is: 0.4830\n",
      "The dot product between row 2 column 10 is: 0.0733\n",
      "The dot product between row 2 column 11 is: 0.6296\n",
      "The dot product between row 2 column 12 is: 0.0733\n",
      "The dot product between row 2 column 13 is: 0.3015\n",
      "The dot product between row 2 column 14 is: 0.0733\n"
     ]
    }
   ],
   "source": [
    "num_rows = similarities.shape[0]\n",
    "num_columns = unit_vectors.shape[1]\n",
    "for i in range(num_rows):\n",
    "    for j in range(num_columns):\n",
    "        row = similarities[i]\n",
    "        column = unit_vectors[:,j]\n",
    "        dot_product = row @ column\n",
    "        print(f\"The dot product between row {i} column {j} is: \"\n",
    "              f\"{dot_product:.4f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We've generated 45 dot products, one for each row, column combination.  These outputs can be stored more concisely in a matrix called `dot_products`, where `dot_products[i][j]` is equal to `similarities[i] @ unit_vectors[:,j]`.\n",
    "\n",
    "**Listing 13. 40. Storing all-by-all dot products in a matrix**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[0.13402795 0.55092394 0.54227624 0.82762755 0.6849519  0.13402795\n",
      "  0.13402795 0.13402795 0.13402795 0.54227624 0.14267565 0.82762755\n",
      "  0.14267565 0.13402795 0.14267565]\n",
      " [0.07969524 0.48736297 0.28970812 0.84440831 0.56705821 0.07969524\n",
      "  0.07969524 0.07969524 0.07969524 0.28970812 0.2773501  0.84440831\n",
      "  0.2773501  0.07969524 0.2773501 ]\n",
      " [0.30151134 0.25478367 0.48298605 0.62960397 0.55629501 0.30151134\n",
      "  0.30151134 0.30151134 0.30151134 0.48298605 0.07330896 0.62960397\n",
      "  0.07330896 0.30151134 0.07330896]]\n"
     ]
    }
   ],
   "source": [
    "dot_products = np.zeros((num_rows, num_columns))\n",
    "for i in range(num_rows):\n",
    "    for j in range(num_columns):\n",
    "        dot_products[i][j] = similarities[i] @ unit_vectors[:,j]\n",
    "\n",
    "print(dot_products)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The operation we've just executed is called a **matrix product**.  Given two matrices `matrix_a` and `matrix_b`, we can compute their product by calculating `matrix_c`, where `matrix_c[i][j]` is equal to `matrix_a[i] @ matrix_b[:,j]`.\n",
    "\n",
    "#### NumPy Matrix Products\n",
    "\n",
    "Conveniently, NumPy's product operator `@` can be applied to 2D matrices as well as to 1D arrays. If `matrix_a` and `matrix_b` are both NumPy arrays, then `matrix_c` is will equal `matrix_a @ matrix_b`.\n",
    "\n",
    "**Listing 13. 41. Computing a matrix product using NumPy**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "matrix_product = similarities @ unit_vectors\n",
    "assert np.allclose(matrix_product, dot_products)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose we were to flip our input matrices, and run `unit_vectors @ similarities`? What will happen? NumPy will throw an error! The computation takes the vector dot product between rows in `unit_vectors` and columns in `similarities`. However, these rows and columns have different lengths. Therefore, the computation is not possible.\n",
    "\n",
    "\n",
    "**Listing 13. 42. Computing an erroneous matrix product**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "We can't compute the matrix product\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    matrix_product = unit_vectors @ similarities\n",
    "except:\n",
    "    print(\"We can't compute the matrix product\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In mathematics, the words _product_ and _multiplication_ are often interchangeable. Thus, computing the matrix product is commonly called **matrix multiplication**. That name is so ubiquitous that NumPy includes an `np.matmul` function. The output of `np.matmul(matrix_a, matrix_b)` is identical to `matrix_a @ matrix_b`.\n",
    "\n",
    "**Listing 13. 43. Running matrix multiplication using `matmul`**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "matrix_product = np.matmul(similarities, unit_vectors)\n",
    "assert np.array_equal(matrix_product,\n",
    "                      similarities @ unit_vectors)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets compare the matrix-product speed between NumPy and regular Python. We’ll generate 100 matrices, composed entirely of ones. We’ll multiply every matrix by itself, using NumPy and also Python for-loops. We will also time each multiplication using Python’s built-in `time` module. Finally, we’ll plot matrix size vs running-time for our for-loop and our NumPy multiplications.\n",
    "\n",
    "**Listing 13. 44. Comparing matrix product running-times**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAEGCAYAAABy53LJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nOzdeZzN1f/A8dd7dmMfpsg2kn3sgwohQrJlD5FItPqWvvgqaf1VliQpSvYsqZiKqGwttqFJKFLEpLIvmRkzc+f9++Ne08yY5ZLrzvJ+Ph734d7zOZ9z3/fOdd/38znnc46oKsYYY4y7fLwdgDHGmNzFEocxxphLYonDGGPMJbHEYYwx5pJY4jDGGHNJ/LwdwNVQsmRJDQsL83YYxhiTq2zbtu2YqoamL88XiSMsLIyoqChvh2GMMbmKiPyWUbmdqjLGGHNJLHEYY4y5JJY4jDHGXJJ80ceRkcTERGJiYoiPj/d2KCYPCwoKomzZsvj7+3s7FGOuGI8mDhFpB7wG+ALvqOpL6bYHAnOBBsBxoJeqHhCREsBSoCEwW1UfyqDtSOB6VQ2/nNhiYmIoXLgwYWFhiMjlNGFMllSV48ePExMTQ8WKFb0djjFXjMdOVYmIL/AGcDtQA7hLRGqkqzYIOKmqNwCvAi+7yuOBp4ARmbTdFfj738QXHx9PiRIlLGkYjxERSpQoYUe1Js/xZB9HI2Cfqv6qqgnAIqBzujqdgTmu+0uBViIiqnpOVb/GmUDSEJFCwGPA8/82QEsaxtPsM2byIk8mjjLAoVSPY1xlGdZR1STgNFAim3afAyYCsVcmTGOMyXt+/vln/ve//5GcnHzF285Vo6pEpC5QSVU/cqPuEBGJEpGoo0ePXoXoLp2I8Pjjj6c8njBhAuPGjbsibY8bN44yZcpQt25dwsPDiYyMvKR9RYR9+/allE2ePBkRyfZCysmTJxMbm3lOHzx4MLt373Y7lrCwMGrVqkXt2rVp06YNf/75p9v7prdu3To6dOhwWfsuW7bskuI2xpsOHjxI69atefvtt/n999+vePueTBy/A+VSPS7rKsuwjoj4AUVxdpJn5iYgQkQOAF8DVURkXUYVVXWGqkaoakRo6EVXzOcIgYGBfPjhhxw7dswj7f/nP/8hOjqa999/n3vvvfeSfnnUqlWLRYsWpTx+//33qVmzZrb7ZZU4HA4H77zzDjVqpO/qytratWvZsWMHERERvPjiixm262mWOExu8ddff3Hbbbdx6tQpVq9eTbly5bLf6RJ5MnFsBSqLSEURCQB6A+l/9kYCA1z3uwNrNIslCVX1TVW9TlXDgKbAXlVtccUjv0r8/PwYMmQIr7766kXb7rnnHpYuXZryuFChQoDzV3Pz5s3p3Lkz119/PaNGjWLBggU0atSIWrVq8csvv1zUVvXq1fHz8+PQoUNUrFiRxMREAM6cOZPmcWpdunRh+fLlAPzyyy8ULVqUkiVLpmwfNmwYERER1KxZk6effhqAKVOmcPjwYVq2bEnLli1T4n788cepU6cOGzdupEWLFkRFRfHbb79RuXJljh07RnJyMs2aNWP16tVZvl+33HJLylFQ+na//PJL6tWrR61atbj33ns5f/48AJ999hnVqlWjfv36fPjhhyltjRs3jgkTJqQ8Dg8P58CBAwDMnTuX2rVrU6dOHe6++26+/fZbIiMjeeKJJ6hbt26G77ExOcGpU6do27Ythw4d4tNPP6VevXoeeR6PDcdV1SQReQhYhXM47ruquktEngWiVDUSmAnME5F9wAmcyQUA11FFESBARLoAbVTVYz/5ek3feFFZh9qlufumMOISHNwza8tF27s3KEuPiHKcOJfAsPnb0mxbfP9Nbj3vgw8+SO3atfnvf//rdqzff/89P/74IyEhIVx//fUMHjyYLVu28Nprr/H6668zefLkNPU3b96Mj48P5cuXp0WLFnz66ad06dKFRYsW0bVr1wyvMShSpAjlypVj586dLF++nF69ejFr1qyU7S+88AIhISE4HA5atWrFjh07eOSRR5g0aRJr165NSTLnzp2jcePGTJw4MU37FSpUYOTIkQwbNoxGjRpRo0YN2rRpk+Xr/uSTT6hVq9ZF7cbHx1O5cmW+/PJLqlSpQv/+/XnzzTcZOnQo9913H2vWrOGGG26gV69e2b63u3bt4vnnn+fbb7+lZMmSnDhxgpCQEDp16kSHDh3o3r17tm0Y4y0XTgV//PHHNG3a1GPP49E+DlVdoapVVLWSqr7gKhvrShqoaryq9lDVG1S1kar+mmrfMFUNUdVCqlo2fdJQ1QOXew1HTlKkSBH69+/PlClT3N6nYcOGlC5dmsDAQCpVqpTyhVurVq2UX80Ar776KnXr1mXEiBEsXrwYEWHw4MEpCWDWrFkMHDgw0+fp3bs3ixYtYtmyZdx5551pti1ZsoT69etTr149du3alelpHF9fX7p165bhtsGDB3PmzBneeuutNL/+02vZsiV169blzJkzjB49+qJ29+zZQ8WKFalSpQoAAwYMYMOGDfz0009UrFiRypUrIyL069cv0+e4YM2aNfTo0SMl8YWEhGS7jzE5wccff8wHH3zAM888Q9u2bT36XPn2yvH0sjpCKBDgm+X2kIIBbh9hZGT48OHUr18/zZe4n59fSp9EcnIyCQkJKdsCAwNT7vv4+KQ89vHxISkpKWXbf/7zH0aMSHspTJMmTThw4ADr1q3D4XAQHp557u3QoQNPPPEEERERFClSJKV8//79TJgwga1bt1K8eHHuueeeTK9VCAoKwtfXN8NtsbGxxMTEAPD3339TuHDhDOulPoJxp113pH5/AbvWwuRqf//9Nw8++CDh4eEX/Z/3hFw1qiqvCgkJoWfPnsycOTOlLCwsjG3bnKe/IiMjM+yHuFz9+/enT58+WR5tAAQHB/Pyyy8zZsyYNOVnzpyhYMGCFC1alL/++ouVK1embCtcuDBnz551K46RI0fSt29fnn32We67775LfyEuVatW5cCBAyn9H/PmzaN58+ZUq1aNAwcOpPRJLFy4MGWfsLAwtm/fDsD27dvZv38/ALfeeivvv/8+x487x2icOHHikl+XMVfbU089RUxMDDNmzLgq09tY4sghHn/88TSjq+677z7Wr1+f0vlbsGDBK/Zcffv25eTJk9x1113Z1u3duzf169dPU1anTh3q1atHtWrV6NOnD02aNEnZNmTIENq1a5fSOZ6Z9evXs3Xr1pTkERAQkKYP5VIEBQUxa9YsevToQa1atfDx8WHo0KEEBQUxY8YM7rjjDurXr88111yTsk+3bt04ceIENWvWZOrUqSmnuWrWrMmYMWNo3rw5derU4bHHHkt5H8aPH0+9evWsc9zkKFFRUUyZMoWhQ4dy002Xf+bjUkgWg5jyjIiICE1//cGPP/5I9erVvRSRdy1dupTly5czb948b4eSL+Tnz5rxrP3793PLLbfgcDj48ccfKVq06BVtX0S2qWpE+nLr48hnHn74YVauXMmKFSu8HYox5l84ePAgt956K+fOnWPt2rVXPGlkxRJHPvP66697OwRjzL/0+++/06pVK06cOMGXX35JnTp1rurzWx+HMcbkAomJiXzyySf07NmTSpUq8eeff7Jq1SoiIi46k+RxdsRhjDE5nMPhoHXr1mzYsIGSJUty//33M2zYMKpVq+aVeCxxGGNMDjdx4kQ2bNjA5MmTGTZsGAEBAV6NxxKHMcbkYLt37+app56ia9euPPLIIzlijRfr4/AiX19f6tatm3JLPV3IpUo/aZ8n3XPPPQQHB6e5IG748OGISLYz/WY0u21q7du359SpU27HcuE9DA8Pp0ePHllO6Z6d2bNn89BDF61S7Pa+hw8fvuznNiYjSUlJDBgwgCJFivDmm2/miKQBlji8qkCBAkRHR6fcwsLC3Nov9bQi3nLDDTekzJ6bnJzMmjVrKFMm/TpdF8sscagqycnJrFixgmLFirkdx4X3cOfOnQQEBPDWW29l2K6nWeIwnvDKK68QFRXFtGnT0lzA6m2WOHKY+Ph4Bg4cSK1atahXrx5r164FnF9MnTp14tZbb6VVq1Zutzdp0iTCw8MJDw9PM2tuRuUHDhygWrVq9O3bl+rVq9O9e/dMf8H37t2bxYsXA86p3ps0aYKf3z9nPrt06UKDBg2oWbMmM2bMAGDUqFHExcVRt25d+vbty4EDB6hatSr9+/cnPDycQ4cOERYWxrFjx9i6dSu1a9cmPj6ec+fOUbNmTXbu3Jnla23WrBn79u3LsN2FCxdSq1YtwsPDGTlyZMo+s2bNokqVKjRq1IhvvvkmpTyzae0BXn75ZWrVqkWdOnUYNWoUS5cuJSoqir59+1K3bl3i4uKy/bsYk53vvvuOcePG0atXL3r06OHtcNKwPg6cp1mio6OvaJt169a9aHrz9C58iQJUrFiRjz76iDfeeAMR4YcffuCnn36iTZs27N27F3DOqbRjxw63Z2zdtm0bs2bNYvPmzagqjRs3pnnz5iQnJ2dYXrx4cfbs2cPMmTNp0qQJ9957L9OmTctw0rQqVaoQGRnJyZMnWbhwIf369UszZ9W7775LSEgIcXFxNGzYkG7duvHSSy8xderUlPf6wIED/Pzzz8yZM4cbb7wxTfsNGzakU6dOPPnkk8TFxdGvX78sJ2RMSkpi5cqVtGvXDiBNu4cPH2bkyJFs27aN4sWL06ZNG5YtW0bjxo15+umn2bZtG0WLFqVly5bZrl+wcuVKli9fzubNmwkODk6Zdn3q1KlMmDDBK0MjTd4TFxdH3759CQ0NZdq0ad4O5yKWOLzowmmW1L7++msefvhhAKpVq0aFChVSEsdtt912SdN8f/3119x5550p81x17dqVr776ClXNsLxTp06UK1cuZe6pfv36MWXKlExn2+zatSuLFi1i8+bNTJ8+Pc22KVOm8NFHzhV+Dx06xM8//0yJEhcvJ1+hQoWLksYFY8eOpWHDhgQFBWU67Xzq5NusWTMGDRrE4cOH07S7detWWrRowYWVIPv27cuGDRsA0pT36tUr5b3OzBdffMHAgQMJDg4GbNp14xmjRo3ixx9/ZPXq1TnyM2aJA7I9MsgpUk90OGbMGD799FOAK3q0lL7zLavOuF69etGgQQMGDBiAj88/Zz3XrVvHF198wcaNGwkODqZFixaZTlue1eSNx48f5++//yYxMZH4+PgM62aUfLNr1x1ZTWtvjCd9/vnnTJkyhUceeYTbbrvN2+FkyPo4cphmzZqxYMECAPbu3cvBgwepWrXqRfVeeOGFlE71rNpatmwZsbGxnDt3jo8++ohmzZplWg7O+W82bnSuhvjee+9luYpYhQoVeOGFF3jggQfSlJ8+fZrixYsTHBzMTz/9xKZNm1K2+fv7uz1F/P33389zzz1H37590/RLXKpGjRqxfv16jh07hsPhYOHChTRv3pzGjRuzfv16jh8/TmJiIu+//37KPplNa3/bbbcxa9aslL4fm3bdXEk///wzd999N9WrV+ell17ydjiZsiOOHOaBBx5g2LBh1KpVCz8/P2bPnp1m4aasPP/882mOnmJiYrjnnnto1KgR4Fxx78I5/IzKL3Qqv/HGG9x7773UqFGDYcOGZfmc999//0Vl7dq146233qJ69epUrVo1zamoIUOGULt2berXr88LL7yQabtz587F39+fPn364HA4uPnmm1mzZg233nqrW+9FaqVLl+all16iZcuWqCp33HEHnTt3BpzDmG+66SaKFSuWcsoLnNPad+7cmTp16tCuXbuUI5h27doRHR1NREQEAQEBtG/fnhdffJF77rmHoUOHUqBAATZu3EiBAgUuOU6Tv/3222+0atUKh8PBBx98kKM/Qzatuklx4MABOnTokO3oJXNp7LNmsnP48GGaNWvGiRMnWLt2bZofMd5k06obY0wOlJSURNu2bTly5AhffPFFjkkaWfFoH4eItBORPSKyT0RGZbA9UEQWu7ZvFpEwV3kJEVkrIn+LyNRU9YNF5FMR+UlEdolIzj0JmAuFhYXZ0YYxV9nSpUvZuXMns2bNonHjxt4Oxy0eSxwi4gu8AdwO1ADuEpEa6aoNAk6q6g3Aq8DLrvJ44Ckgo3GgE1S1GlAPaCIit19ujPnhNJ3xLvuMmayoKhMmTKBq1ap07drV2+G4zZNHHI2Afar6q6omAIuAzunqdAbmuO4vBVqJiKjqOVX9GmcCSaGqsaq61nU/AdgOlL2c4IKCgjh+/Lj9xzYeo6ocP36coKAgb4dicqgNGzawbds2HnvssTRD2nM6T/ZxlAEOpXocA6Q/Dkupo6pJInIaKAFkPVMeICLFgI7Aa5lsHwIMAShfvvxF28uWLUtMTAxHjx7N9oUYc7mCgoIoW/ayftuYfGDChAmEhoZy9913ezuUS5IrO8dFxA9YCExR1V8zqqOqM4AZ4BxVlX67v78/FStW9GicxhiTmZ9++olPPvmEp59+OkcPvc2IJ4+NfgfKpXpc1lWWYR1XMigKHHej7RnAz6qaOy75NsaYdCZNmkRQUNBFF9DmBp5MHFuByiJSUUQCgN5AZLo6kcAA1/3uwBrNptNBRJ7HmWCGX+F4jTHmqvjqq6+YO3cu/fv3z1HTpbvLY6eqXH0WDwGrAF/gXVXdJSLPAlGqGgnMBOaJyD7gBM7kAoCIHACKAAEi0gVoA5wBxgA/Adtd8yhNVdV3PPU6jDHmStqwYQPt27cnLCyMZ555xtvhXBaP9nGo6gpgRbqysanuxwMZTjSvqmGZNJszlsAyxphLdCFplCtXjjVr1lCqVClvh3RZcs/4L2OMycU++ugjbr/9dsqVK8fatWspXbq0t0O6bJY4jDHGg5KTk3nmmWfo2rUr4eHhrFu3LtceaVyQK4fjGmNMbhAfH0/fvn358MMP6d+/P9OnT88TF4TaEYcxxniAqjJ06FA+/PBDJk6cyOzZs/NE0gA74jDGGI947bXXmDNnDs888wyPPfaYt8O5ouyIwxhjrrAvvviCxx9/nK5du/Lkk096O5wrLtsjDhHxAeoA1wFxwE5VPeLpwIwxJjf67bff6NmzJzVq1GDOnDm5avJCd2WaOESkEjASaA38DBwFgoAqIhILTAfmqGry1QjUGGNyugv9GomJiSxbtoxChQp5OySPyOqI43ngTeD+9NOAiMg1QB/gbv6ZFt0YY/K1JUuW8NlnnzF58mQqVark7XA8Jt+uOW6MMVfSqVOnqFatGmXLlmXz5s34+vp6O6R/LbM1x7M9+SYiPUSksOv+UyLyoYjU90SQxhiTW40aNYqjR48yY8aMPJE0suJOr81TqnpWRJoCrXBOTPimZ8MyxpjcY+XKlUyfPp3hw4dTv37e/13tTuJwuP69A5ihqp8CAZ4LyRhjco/ly5fTpUsX6tatm2tnu71U7iSO30VkOtALWCEigW7uZ4wxedqiRYvo1q0b9erVY82aNXl2FFV67iSAnjjX1GirqqeAEOAJj0ZljDE5XGRkJH369KFJkyZ8/vnnFC9e3NshXTWZjqoSkZCsdlTVEx6JyANsVJUx5ko6ffo01atX59prr+Wbb74hODjY2yF5RGajqrK6jmMboDgXTioPnHTdLwYcBCp6IE5jjMnxxowZw19//UVkZGSeTRpZyfRUlapWVNXrgS+AjqpaUlVLAB2A1VcrQGOMyUk2b97MtGnTeOihh4iIuOjHeL6Q7QWAIvKDqtbKriwns1NVxpgrITExkYiICI4fP87u3bspUqSIt0PyqMu+ABA4LCJPikiY6zYGOOzmk7YTkT0isk9ERmWwPVBEFru2bxaRMFd5CRFZKyJ/i8jUdPs0EJEfXPtMERFbg9wY4zGqyq5du3j55Zdp1qwZO3bsYOrUqXk+aWTFncRxFxAKfOS6XeMqy5KI+AJvALcDNYC7RKRGumqDgJOqegPwKvCyqzweeAoYkUHTbwL3AZVdt3ZuvAZjjLkso0ePJjw8nFGjRpGQkMDkyZPp0qWLt8PyqmynVXeNnnr0MtpuBOxT1V8BRGQR0BnYnapOZ2Cc6/5SYKqIiKqeA74WkRtSNygipYEiqrrJ9Xgu0AVYeRnxGWNMln7++WcmTpxIr169mDhxImXKlPF2SDmCO+txVMH5yz8sdX1VvTWbXcsAh1I9jgEaZ1ZHVZNE5DRQAjiWRZsx6drM8C8pIkOAIQDly5fPJlRjjLnY6NGjCQwMZPLkyZQqVcrb4eQY7iwd+z7wFvAO/0w/kuOp6gxgBjg7x70cjjEml9m4cSMffPABzzzzjCWNdNxJHEmqejmTGv4OlEv1uKyrLKM6MSLiBxQFjmfTZtls2jTGmH9FVXniiScoVapUnlsv/Epwp3P8YxF5QERKi0jIhZsb+20FKotIRREJAHoDkenqRAIDXPe7A2vSLxqVmqr+AZwRkRtdo6n6A8vdiMUYY9ySnJzMu+++yzfffMOzzz6bb+afuhTuHHFc+GJPPT+VAtdntZOrz+IhnPNc+QLvquouEXkWiFLVSJxTtM8TkX3ACZzJBQAROQAUAQJEpAvQRlV3Aw8As4ECODvFrWPcGPOvHTt2jLfffpuZM2fyyy+/UK9ePQYOHOjtsHIkWwHQGJPvnTp1ioYNG7Jv3z5atGjB4MGD6dq1KwUKFPB2aF51OXNVXdjRHxgG3OIqWgdMV9XEKxqhMcZ4QXJyMnfffTcHDhxg3bp1NG/e3Nsh5XjunKp6E/AHprke3+0qG+ypoIwx5mp58cUX+eSTT3j99dctabjJncTRUFXrpHq8RkS+91RAxhhztXz22WeMHTuWvn378uCDD3o7nFzDraVjRaTShQcicj256HoOY4zJyJIlS+jatSvh4eFMnz4dm/bOfe4kjieAtSKyTkTWA2uAxz0bljHGeEZycjJjx46lV69e1K9fn88//5yCBQt6O6xcxZ25qr4UkcpAVVfRHlU979mwjDHmylNVBgwYwPz587n33nuZNm0agYGB3g4r18n2iENEHgQKqOoOVd0BBIvIA54PzRhjrqwPPviA+fPnM3bsWN555x1LGpfJnYWcolW1brqy71S1nkcju4LsOg5jzNmzZ6levTqhoaFs3boVPz93xgblb5d9HQfg65rqXF0N+QIBVzpAY4zxpKeffprDhw/zwQcfWNL4l9x59z4DFovIdNfj+11lxhiTK3z//fdMmTKFIUOG0Lhx+tUdzKVyJ3GMxJkshrkef45zinVjjMmx1q9fz7fffsuBAwf44osvCAkJ4f/+7/+8HVae4M6oqmQRmY1z5to9ng/JGGP+nU2bNtGiRQsAQkNDCQsLY9q0aRQvXty7geUR7sxV1QkYj7Nfo6KI1AWeVdVOng7OGGMuVXJyMsOHD6d06dLs3LmTkBB3VoEwl8KdU1VP41w/fB2AqkaLSEVPBmWMMZdr4cKFbN68mdmzZ1vS8BB3rhxPVNXT6cry/lzsxphc59y5c4wcOZIGDRpw9913ezucPMudI45dItIH57DcysAjwLeeDcsYYy7d+PHj+f3331m0aBE+Pu78LjaXw5139mGgJnAeWAicAYZ7MihjjLkUqsq8efN4+eWX6dmzJ02bNvV2SHmaO6OqYoExwBgRKQ6cympdcGOMuZqOHDnC/fffz7Jly2jatCmvvfaat0PK8zI94hCRsSJSzXU/UETWAPuAv0Sk9dUK0BhjMrNnzx7Cw8NZsWIF48ePZ926dZQqVcrbYeV5WZ2q6gVcuG5jgKvuNUBz4EV3GheRdiKyR0T2icioDLYHishi1/bNIhKWattoV/keEWmbqvw/IrJLRHaKyEIRCXInFmNM3nLu3Dm6deuGqrJt2zZGjBiBr6+vt8PKF7JKHAmpTkm1BRaqqkNVf8S96z98gTeA24EawF0iUiNdtUHASVW9AXgVeNm1bw2gN86+lXbANBHxFZEyODvnI1Q1HPB11TPG5COqytChQ9m9ezfvvfce4eHh3g4pX8kqcZwXkXARCQVaAqtTbQt2o+1GwD5V/VVVE4BFQOd0dToDc1z3lwKtxLkMV2dgkaqeV9X9OE+RNXLV8wMKiIifK47DbsRijMkF9u7dy8SJE8muG/Xtt99m/vz5jBs3jttuu+0qRWcuyCpxDMf5Zf4T8KrrCxwRaQ9850bbZYBDqR7HuMoyrKOqScBpoERm+6rq78AE4CDwB3BaVVdjjMkTxo8fz4gRI/jggw8yrbNy5Uoefvhh2rZty5NPPnkVozMXZJo4VHWTqlZT1RKq+lyq8hWqetfVCS8t16iuzkBF4DqgoIj0y6TuEBGJEpGoo0ePXs0wjTGXQVVZtWoVACNHjiQhIeGiOq+//jodOnSgZs2azJ8/367V8JKsRlX1kyxWbxeRSiKS1WDp34FyqR6XdZVlWMd16qkocDyLfVsD+1X1qKomAh8CN2f05Ko6Q1UjVDUiNDQ0izCNMTnBTz/9xKFDh+jWrRu//vor06ZNS9mWlJTEww8/zCOPPELHjh3ZsGEDJUuW9GK0+VtWndwlgGgR2QZsA44CQcANOEdWHQMuGimVylagsmteq99xdmL3SVcnEueIrY1Ad5wz8KqIRALvicgknEcWlYEtQDJwo4gEA3FAK8CW9jMmD/jsM+cyPxMnTuTs2bM8++yz9O/fn1OnTtGvXz82btzIiBEjeOmll2z0lJdlmjhU9TURmQrcCjQBauP8sv4RuFtVD2bVsKomichDwCqco5/eVdVdIvIsEKWqkcBMYJ6I7ANO4Boh5aq3BNgNJAEPqqoD2CwiS4HtrvLvgBmX//KNMTnFqlWrqFq1KhUqVGD8+PHUrVuXnj17smXLFnx8fFi4cCG9e9sgypwg2zXH8wJbc9yYnC0uLo6QkBCGDBmScuX34MGDmTlzJk2bNmX+/PlUqFDBy1HmP/9mzXFjjPGor776ivj4eNq2TbnWl1dffZU77riDjh072hrhOYz9NYwxXrdq1SoCAwNp3rx5SlnhwoW58847vRiVyYyNZTPGeN2qVato1qwZBQsW9HYoxg3ZJg4RuVZEZorIStfjGiIyyPOhGWPyg0OHDrFr1640p6lMzubOEcdsnCOjrnM93outx2GMuUI+/vhjAEscuYg7iaOkquc1JewAACAASURBVC7BeQ3FhalBHB6NyhiT56kqU6ZM4dFHH6V27do2UWEu4k7iOCciJXCtMy4iN+KcU8oYYy7L2bNn6d27N48++ijt27dn3bp1ZDFRhclh3BlV9RjOK7wricg3QCjOq7yNMeaSxcbG0qZNG7Zs2cLLL7/MiBEjbM6pXMadpWO3i0hzoCogwB7XPFHGGHNJkpKSuOuuu9i8eTNLly6la9eu3g7JXAZ3F2RqD4S56rcREVR1kodjM8bkIarKQw89RGRkJFOnTrWkkYu5c6rqYyAe+AFXB7kxxlyq559/nunTpzNq1CgefPBBb4dj/gV3EkdZVa3t8UiMMXnWK6+8wtixYxkwYAAvvviit8Mx/5I7PVIrRaSNxyMxxuRJkyZNYuTIkdx1113MnDnTRk/lAe4ccWwCPhIRHyARZwe5qmoRj0ZmjMnVVJVJkyYxYsQIevTowdy5c20djTzCncQxCbgJ+EHzwxzsxph/LT4+nmHDhjF79my6d+/OggULbIbbPMSdU1WHgJ2WNIwx7jh48CDNmjVj9uzZPP300yxevBh/f39vh2WuIHd+AvwKrHNNcnj+QqENxzXGpJaUlMRbb73Fk08+SXJyMsuWLaNz587eDst4gDuJY7/rFuC6GWNMGps2bWLYsGFER0fTunVrpk2bRuXKlb0dlvEQd64cf+ZqBGKMyZ127txJy5YtKVmyJO+//z7dunWzkVN5XKaJQ0Qmq+pwEfkY1wSHqalqJ49GZozJ8WJjY+nVqxdFixYlKiqKa6+91tshmasgq87xea5/JwATM7hlS0TaicgeEdknIqMy2B4oIotd2zeLSFiqbaNd5XtEpG2q8mIislREfhKRH0XkJndiMcZcGlVl7ty51KpVi2XLlmVY5z//+Q+7d+9m7ty5ljTyE1XN8AbMzmybOzfAF/gFuB5n38j3QI10dR4A3nLd7w0sdt2v4aofCFR0tePr2jYHGOy6HwAUyy6WBg0aqDHGfT/99JPeeuutCmhwcLAGBQXpN998k6bOkiVLFNCRI0d6KUrjaUCUZvCdmtURx7+dZqQRsE9Vf1XVBGARkH6IRWdXIgBYCrQS58nRzsAiVT2vqvuBfUAjESkK3ALMBFDVBFU99S/jNMaksmrVKurUqcP27dt566232L9/P+XKlaNjx47s2bOH06dP8+STTzJgwAAaN27Mc8895+2QzVWWVed4sIjUw3ml+EVUdXs2bZfBeQ3IBTFA48zqqGqSiJwGSrjKN6XbtwwQBxwFZolIHWAb8Kiqnkv/5CIyBBgCUL58+WxCNcYAfPfdd3Tv3p1q1arx2WefUapUKQBWrlzJTTfdROvWrYmLi+P48eP07t2bSZMm2TUa+VBWiaMMzr6MjBKHArd6JKKs+QH1gYdVdbOIvAaMAp5KX1FVZwAzACIiIuziRZMnORwOkpOTr8iX98GDB7njjjsoXrw4K1asSEkaAJUqVeLTTz+ldevWNGrUiJdffpn69ev/6+c0uVNWiWOfqv6b5PA7UC7V47KusozqxIiIH1AUOJ7FvjFAjKpudpUvxZk4jMmXevXqxYEDB/jqq68oUKDAJe175MgR1q9fz8mTJzl9+jSzZs0iNjaWr7/+muuuu+6i+g0bNuT48eM2dYhx6wLAy7UVqCwiFXF+6fcG+qSrEwkMADbiXI52jaqqiEQC74nIJOA6oDKwRVUdInJIRKqq6h6gFbDbg6/BmBxLVVm/fj3Hjh3jwQcf5N133812n7i4OBYuXMiiRYv48ssvSU7+Z4mdokWL8tFHHxEeHp7p/pY0DGSdOEb+m4ZdfRYPAatwjrB6V1V3icizOHvqI3F2cs8TkX3ACZzJBVe9JTiTQhLwoKo6XE0/DCwQkQCc06EM/DdxGpNbHT58mGPHjlG1alVmzZpFkyZNGDRoUKb19+7dS48ePdixYweVKlVi9OjRdO7cmdKlS1O0aFEKFSpkF+4Zt4jmg7kLIyIiNCoqytthGHNFffrpp3To0IF169bxwgsvsGHDBjZu3Ei9evUuqrt48WIGDx5MYGAgs2fP5o477rAkYbIlIttUNSJ9uTuz4xpjcqDo6GgA6tWrx4IFCwgNDaVLly7ExMSk1FFVnnzySXr37k2dOnWIjo6mQ4cOljTMv2KJw5hcKjo6mkqVKlGkSBFCQ0OJjIzk1KlTtGnThmPHjgHwzDPP8MILLzB48GDWrl1L2bJlvRy1yQuy7enKZK6q00AUMF1V4z0RmDEma9HR0WlOS9WrV4/IyEjatWtH+/btadeuHc899xwDBw5k+vTp+PjY70RzZbjzSfoV+Bt423U7A5wFqrgeG2OusrNnz7Jv3z7q1q2bprx58+YsWbKE7du389xzz9GvXz/efvttSxrminJnbN3Nqtow1eOPRWSrqjYUkV2eCswYk7nvv/8e4KLEAdCxY0eWLl1KVFQU48aNs3W+zRXnTuIoJCLlVfUggIiUBwq5tiV4LDJjTKYudIxnlDgAunTpQpcuXa5mSCYfcSdxPA58LSK/4Jx+pCLwgIgU5J8JCo0xV1F0dDQlS5bM8ApvYzzNnRUAV4hIZaCaq2hPqg7xyR6LzBiTqejoaOrWrWvDao1XuNtj1gCoCdQBeopIf8+FZIzJSmJiIjt37sz0NJUxnubOcNx5QCUgGrgw7YcCcz0YlzEmE3v27OH8+fOWOIzXuNPHEYFz5b68PzeJMblA6ivGjfEGd05V7QRKZVvLGHNVREdHExQURJUqVbwdismn3DniKAnsFpEtwPkLharayWNRGWNSJCQkMH78eL7//nvi4uLYsmULtWrVsinOjde488kb5+kgjDEZO3jwIL169WLTpk1UqVKFggULcsMNN3D//fd7OzSTj7kzHHf91QjEGJPWypUr6devH4mJiSxZsoQePXp4OyRjgCz6OETka9e/Z0XkTKrbWRE5c/VCNCZ/OX36NPfddx/t27enbNmyREVFWdIwOUqmiUNVm7r+LayqRVLdCqtqkasXojH5x6effkqNGjV49913+e9//5tyisqYnMSt3jUR8QWuTV3/wtxVxpgrIzIyki5dulCzZk2WLVtGw4YNs9/JGC9w5wLAh4Gngb+ACyvbK1Dbg3EZk6/s2LGDPn36EBERwfr16ylQoIC3QzImU+5cx/EoUFVVa6pqLdfNraQhIu1EZI+I7BORURlsDxSRxa7tm0UkLNW20a7yPSLSNt1+viLynYh84k4cxuRkR44coVOnThQtWpRly5ZZ0jA5njuJ4xDOFf8uiev01hvA7UAN4C4RqZGu2iDgpKreALwKvOzatwbQG+f8WO2Aaa72LngU+PFSYzImp4mNjaVr164cOXKE5cuX22y3JldwdwXAda4jgMcu3NzYrxGwT1V/VdUEYBHQOV2dzvwzNftSoJU4p/vsDCxS1fOquh/Y52oPESkL3AG840YMxuRYp0+fpm3btnz77bfMmTOHiIgIb4dkjFvcSRwHgc+BAKBwqlt2yuA8WrkgxlWWYR1VTcJ5ZFMim30nA//ln/6WDInIEBGJEpGoo0ePuhGuMVfP0aNHadmyJZs2bWLRokU23NbkKu5cAPjM1QjEHSLSATiiqttEpEVWdVV1BjADICIiwiZoNF534sQJtmzZwpYtW5g3bx4xMTEsX76c9u3bezs0Yy6JO6OqqgAjgDDSDse9NZtdfwfKpXpc1lWWUZ0YEfEDigLHs9i3E9BJRNoDQUAREZmvqv2yex3GeNPSpUu56667SEpKQkSoWbMmq1at4pZbbvF2aMZcMneu43gfeAtnn4Ijm7qpbQUqi0hFnF/6vYE+6epEAgOAjUB3YI2qqohEAu+JyCTgOqAysEVVNwKjAVxHHCMsaZicbsuWLdx99900bNiQF154gQYNGlCkiF1Da3IvdxJHkqq+eakNq2qSiDwErAJ8gXdVdZeIPAtEqWokMBOYJyL7gBM4kwuuekuA3UAS8KCqXkrSMiZH+O233+jUqRPXXXcdy5cvJzQ01NshGfOvSXbrM4nIOOAI8BFpp1U/4dHIrqCIiAiNiorydhgmn9m7dy/du3fn4MGDbNy4kerVq3s7JGMuiYhsU9WLhvu5c8QxwPXvE6nKFLj+SgRmTF5y8OBBPvzwQxYsWEBUVBQBAQF88sknljRMnuLOqKqKVyMQY3KrP//8kzfeeIPIyEh27NgBQP369Zk4cSK9e/e2i/pMnuPOqKr+GZWr6twrH44xucvPP//MbbfdxqFDh2jatCnjx4+nY8eOVK1a1duhGeMx7pyqSj1FZxDQCtgOWOIw+Vp0dDRt27YlOTmZzZs325XfJt9w51TVw6kfi0gxnNOHGJNvffXVV3Ts2JEiRYqwevVqqlWr5u2QjLlq3JlyJL1zgPV7mHxr8eLFtG7dmlKlSvH1119b0jD5jjt9HB/jHEUFzkRTA+dFgcbkK6rKK6+8wqhRo7jlllv46KOPCAkJ8XZYxlx17vRxTEh1Pwn4TVVjPBSPMTlSdHQ048aNY/ny5dx1113MmjWLwMBAb4dljFdke6pKVdenun0DHBaRvlchNmO8buPGjXTs2JF69eqxdu1a/u///o/58+db0jD5WqaJQ0SKuNbgmCoibcTpIZzrc/S8eiEac3UlJCTw3nvv0bhxY26++Wa+/fZbnn32WX777TdGjRqFj8/ldA0ak3dkdapqHnAS5wSEg4H/AQJ0UdXoqxCbMVfVDz/8wJw5c5g/fz5//fUXVapUYerUqQwYMIBChQp5OzxjcoysEsf1qloLQETeAf4Ayqtq/FWJzJir5Pz589x5552sXLkSf39/OnbsyKBBg2jXrp0dXRiTgawSR+KFO6rqEJEYSxomr1FV7r//flauXMkLL7zAkCFDKFmypLfDMiZHyypx1BGRM677AhRwPRZAVdUWFDC53oQJE5gzZw5PP/00//vf/7wdjjG5QqaJQ1V9r2Ygxlxty5cvZ+TIkfTs2ZOxY8d6Oxxjcg07gWvylbNnzzJz5kyaNm1Kly5daNCgAbNmzbK+DGMugf1vMfnC6dOneeaZZyhbtiyDBw/m2LFjvPTSS6xevZrg4GBvh2dMruLOlePG5FpxcXFMmTKFV155hRMnTtC1a1cef/xxbrrpJkTE2+EZkytZ4jB5UnJyMgsWLGDMmDEcOnSI22+/neeee44GDRp4OzRjcj2PnqoSkXYiskdE9onIqAy2B4rIYtf2zSISlmrbaFf5HhFp6yorJyJrRWS3iOwSkUc9Gb/JfRwOBx9++CENGzakf//+XHPNNaxdu5YVK1ZY0jDmCvFY4hARX+AN4HacM+reJSI10lUbBJxU1RuAV4GXXfvWAHoDNYF2wDRXe0nA46paA7gReDCDNk0+lJSUxNtvv02NGjXo1q0bp06dYt68eWzZsoUWLVp4Ozxj8hRPHnE0Avap6q+qmoBz8afO6ep0Bua47i8FWonzxHNnYJGqnlfV/cA+oJGq/qGq2wFU9SzwI1DGg6/B5ALbt2+nUaNGDBkyhMKFC7NkyRL27t1Lv379bLSUMR7gyf9VZYBDqR7HcPGXfEodVU0CTgMl3NnXdVqrHrD5CsZscpG///6bkSNH0qhRI/744w/ef/99tm7dSo8ePfD1tcuQjPGUXPlzTEQKAR8Aw1X1TCZ1hohIlIhEHT169OoGaDzK4XAwa9YsqlSpwiuvvMLAgQPZvXs33bt3t5FSxlwFnkwcvwPlUj0u6yrLsI6I+AFFgeNZ7Ssi/jiTxgJV/TCzJ1fVGaoaoaoRoaGh//KlmJxAVVm5ciUNGzbk3nvvpXz58nzzzTe8/fbbFC9e3NvhGZNveDJxbAUqi0hFEQnA2dkdma5OJDDAdb87sEZV1VXe2zXqqiJQGdji6v+YCfyoqpM8GLvJQRwOBytWrKBx48a0b9+ekydPsnDhQjZu3MjNN9/s7fCMyXc8dh2Hqia5Fn5aBfgC76rqLhF5FohS1UicSWCeiOwDTuBMLrjqLQF24xxJ9aBrht6mwN3ADyJyYU2Q/6nqCk+9DuMd3333HTNmzCA6OpodO3YQGxtLWFgY77zzDv3798ff39/bIRqTb4nzB37eFhERoVFRUd4Ow7hBVXnttdf473//S2BgIPXr16devXrceOONdOvWzRKGMVeRiGxT1Yj05XbluMkxjh07xqBBg4iMjKRTp068++67lChRwtthGWPSyZWjqkzecvToUUaPHk3FihVZuXIlkydPZtmyZZY0jMmh7IjDeIXD4eDrr79m0aJFzJs3j9jYWHr16sXYsWOpXr26t8MzxmTBEoe5an7//Xc2bNjA+vXriYyM5I8//iA4OJju3bszevRoqlWr5u0QjTFusMRhPCo2NpYFCxYwdepUduzYAUDhwoVp3bo1vXr1okOHDhQsWNDLURpjLoUlDuMRZ8+eZfz48bzxxhucOHGCOnXqMHHiRFq0aEGdOnVsShBjcjFLHOaKcjgczJ49mzFjxvDXX39x5513Mnz4cJo1a2bTgRiTR1jiMFfEvn37eP/995k/fz67d+/mpptuIjIykkaNGnk7NGPMFWaJw1y2kydPsmDBAmbNmsX27dsBaNy4Me+99x69e/e2Iwxj8ihLHOaSHD16lNWrV/Pxxx+zbNkyzp8/T/369Zk4cSLdu3enfPny3g7RGONhljhMlhITE9m4cSOrVq1i1apVbN++HVWlZMmS3HfffQwaNIi6det6O0xjzFVkicNc5Ny5c6xcuZIPPviAFStWcObMGXx9fbnxxht59tlnadeuHfXr17fV9YzJpyxx5FMHDhxg5cqVFC9enGuuuQZV5ZtvvmH9+vV8++23xMfHU7JkSXr06EH79u259dZbKVasmLfDNsbkAJY48qH33nuPoUOHcvbs2TTlIkLdunUZOnQoXbp0oUmTJvj52UfEGJOWfSvkI8ePH+fxxx9nzpw53HzzzUyfPh0fHx+OHj1KQkICDRs2tKMKY0y2LHHkYUlJSWzfvp3Vq1ezcuVKNm3aBMDYsWN56qmn7GjCGHNZ7JsjF0lMTGTy5MmULl2aXr16XbSoUWxsLFFRUWzcuJGvvvqKDRs2pJyOioiIYMyYMXTv3p3atWt7I3xjTB5hiSOXOHLkCD179mT9+vUAjB49muHDh1OqVCk2btzIxo0b+f7773E4HABUqVKFPn360LJlS1q2bMk111zjzfBNHnHs7/OEBAfg42MXd+ZnljhyIFVl9+7dxMXFERAQwJEjRxg4cCDHjh1j7ty5lChRgvHjxzNixAgAChUqRKNGjRg5ciQ33XQTjRs3JjQ01MuvwnviEhy8+81+utYvQ+miBbwdTp6x6dfj9HtnM7fVuJYpd9XD39eGY+dXljhyCFXl119/ZeHChcybN4+9e/em2V6hQgW+/fZb6tWrB0D79u3ZuXMnDoeD8PBwm202laXbYxi/ag8fbI/howeaULSArVP+bx05E89D731H0QL+fPvLcQ6diOX60ELeDst4iUcTh4i0A14DfIF3VPWldNsDgblAA+A40EtVD7i2jQYGAQ7gEVVd5U6bOZWqcurUKWJiYoiJieG3337j4MGDHDhwgL1797J3796U/ojmzZszYsQISpcuTUJCAg6Hg1atWhESEpKmzfDwcG+8lByvX+PynI1P5NXP9zJs/jZmD2xEgJ97v46Tk5Wv9h1DVWl6Q0n87Fc1iY5kHnxvO+fOJ7H8oSaULBRISMEAVJVEh2b53p47n8S5hCRKFAzE105v5RkeSxwi4gu8AdwGxABbRSRSVXenqjYIOKmqN4hIb+BloJeI1AB6AzWB64AvRKSKa5/s2vQoVSUxMZHY2FjOnDnDmTNnOHXqFCdOnLjodvToUf744w/+/PNP/vjjD2JjY9O05efnR9myZalSpQr33HMP1apV44477qBChQpX6+W4TVU5n5RMoiOZQoF+iAgJScn4+YhHz3df+HJyJDu/oLL68vlkx2Gqly5CpdBCPNDiBq4tHMTj73/PqA92MLFnHZKSlYMnYvHzEQL9fAn08yHAz4dAPx/8fH3Y/Otxnlq+k71//Q3AtUUC6Va/LMNbVyHAz4e/zyfh7yv4iuDrIyQ6lPgkB0WCnEc0P/5xhtgEBz4CBQP9KBToR5EC/hQKdP43i01I4lRsIidjEzgbn0SAnw+hhQIpFxIMwOm4RPx8hHPnkzgZm8ip2ARCCwdyfWghVJUjZ88jAj4ixCU4OBWbSLFgf8qFBJOQlMzaPUfwESHAz4cAXx/8fYUyxQtQumgB4hMdRB86RVyCg/NJDpLV+Z5VL12EiiULcjY+kS37TxDk70uBAF+C/HxRlOuKFsDHR/D39eGlbrWocm3hlPd76pp9TN/wK80ql+TWatdQoURBAvx8qFvOOaR7+KLv+PSHP0h0KL4+QomCATSrHMrEnnUA2PbbSf44Hcep2ETOnU8iWaFa6cK0rOrsj/ts558EB/wTT4CfD8WC/bm2SBBJjmS27D+REoufr/NvWbpoENcWCSI+0cHWAyfwFSHQ34cAX19E4NoiQYQWDuRUbAKf7/6LP0/Hk6xQvKA/xYIDaFChOGWKFSAuwcH+Y+dwJCsOVRzJySQ6lMrXFKJEoUCOnIln8/4TJCQlE5/kIC7BQVKycket0pQLCebP0/H88PtpElz/ZxTnG968yjWEFAzg0IlYdv9xBl8RREDE+fdvW7MUwQF+bPvtJJt+PU7BAF+C/H1dn1NfWlW/hiB/X/YdOcu+I+dSXr8I+IrQomoofr4+/PTnGQ4ci6VBheKEFg68/P+AmfDkEUcjYJ+q/gogIouAzkDqL/nOwDjX/aXAVHFOqdoZWKSq54H9IrLP1R5utHnFNGx8I9u//wFUUU2GZAfqSMp2Px8fH0JCQihSPIQTycEEFK5ASJn6lC5SksBioQxq04CeLetz1FGA/yzZwXngeyD6b1i0+Fee7xLMLVVC+XbfMR5dHI0jWfER8PPxwc9XeK13XRpUCGH1rj95ctlO/HwEX1/BxzUb7Zt9G1DjuiJ8suMwr3y2B0VJTv4nvvmDG1OxZEGWbD3Ea1/+TFJyMo5k54dPgI8fbsq1RYJ456tfeXPdL8QlOohLdKCuL5sfxrWhcJA/kz7fy/QNv1AwwA8fAVVIVmXnM20REZ5evpPFUYcQJKXtAgG+RD15GwCPLY5m+feHAeeH3scHrikcxIb/tgTg/nlRrN79V8rzAlxfsiBrRrQAoN87m9m8/3jKtkSH0qF2aab2qQ9AtwZl+f1UHEfOOr8c/joTT6uJ6y/6ez3dsQYDm1QkOMAPQXi1Vx0K+PuxJOoQG34+yhNtqwIwcNYWth44mWbfRmEhLBl6EwAPL/yOfUf+TrO9eZVQ5tzr/Oi2mrieP07Hp9meOt6b/+9LziU40mzv07g8L95Zi2SFxi9+eVHsQ5tXYtTt1Th3Pon75227aPsTbavyYMsbOHr2PL1nbLpo+7iONahYsiK/n4pj0Jyoi7ZP7FGHbg3KMn9Q44t+INSvUJyOdUqz5qcjrNz5JwBFgvzYMa4tAOVLFOTeJhUpU7wAR8+e58iZ85Qv4UySqsq9s7dyOi4xTZs9I8rSsuo1JCcrwxZsS/O3BxjctCJPdqjB+aRk+ryz+aJ4H2lVmcduq8KZuETunrnlou1j2lfnvluu5/i5BJ5YuiPT17vr8Gm6v7Xxou1v9WtAu/BS7P7jDA8v/O6i7bXLFqVcSDBbDpzgkQy2L3+wCSEFA/jq52P876MfLtr++X+KUvnawmzZf4Lxq/ZctH3rmNYE+fuyPPowr6/Zd9H2H59th58vLN56iFnfHGDWwIYpifhK8mTiKAMcSvU4BmicWR1VTRKR00AJV/mmdPuWcd3Prk0ARGQIMAS47Blb7+jQkYDSVUB8SFYBHx/w9adG2RBqliuJT2Awn+09g3+BggQVLEpg4SIEFSxCv2bVaVHtWmJOxvLKZxf/8Zs0Lk/ZsiU4f/RvapctxoXZx8UZN8WCnb9gSxQKpHX1a/D1EZIVkhzJJDmUQoHO7dcVK0Cr6teQ5FCSkhVV5++agoHO/o6ShQJpUKG46/0AwflEwQHO7aWKBnFTpRL4+UjKFOiqSqDr1MP1oQVpF16KAv6+BAf4Eujv/JV+oVO06Q0lCfDz4Wx8IqrOX8KpZ1K/qVIJAv19nXG5vgD8U53WaFntGkoXCwLAkexMOgX8/+mraVOjFFWvLUygvy8+IilHOxe0r1Wa2mWLpjxnwUA/7r4x7dHaw7fe4Hr9QomCgbzWuy6OZCU+MZn4RAeJjmQahjlPAdYqW5TPhv+z4FS78FIkJCWnPO7buAItXF9qScmKn49QoeQ/y96+eGctYhOSSFbl3HkH584nEVIwIGX7Ay0q4evjQ/Fgf4oU8CfBkUyJVNtH3l6NuAQHBQP9KBbsT7ECASnvj7jaT1YlWZUgP1+KBftT2XUEUDjIj08faUpyMiQ4HJxPcn5Wwko44wstHMh7gxtTIMCXwP9v7+5jpKrOOI5/f/smsEZeXBUFdJeCGkop4tZCQGKFtmCttsaKRl7UGqMB62uImia1TfzD+NJqazSt0lprqS/1ZTFFS4AWUgVdRBGFtlRphSDSKr4hsDv79I9zZncYZpWLex2983z+2bnn3rn3nDm789xzzt1zaqqpqgq/D4fGu9HGg+t5fPZ4drbl2NGWY1dbDkmMHNQXoGSrcvywBsYPa8DMWP/Ge/zv/d0UHnbl14/e6z15uQ7jVzOb6d+nlr59aqmvq6G6Snu0KJ+8bCLv7mxjZ1uOnW0d7G7vYOghoTy9aqt54KKxABjQnjPach2dgal/fR0PXTyOXIexu72DXe0dmFlni+nIAX1YPvdrHHrQAVRJbI8tvIYDw+fR1FDPXdPHUF1V1dmqrq0SxwwM729uHMCiKyZSV1NF79pqetVVUxNbZgAThzfwxKUTqI0tv/xN3cC+oT6njhzIqMF98tDpmwAACKlJREFUO2+2LNbhkP4h/5ec9AUumNDIB7ty7GzLdZYh/90wY+xRTBk5sPNvOv97kf/bvWjiUL53/JDOz6OnyYpDek+dWDoTmGJmF8btGcBXzWxOwTFr4zGb4va/CIHgemCFmf0upt8DLIxv+8hzltLc3GytrXvfTTnnnOuepFVm1lycnubI32ZgSMH24JhW8hhJNUBfwiB5d+/dl3M655xLUZqB4zlguKQmSXWEwe6WomNagFnx9ZnAEgtNoBbgbEkHSGoChgPP7uM5nXPOpSi1MY44ZjEHeIrw6Ow8M3tZ0k+AVjNrAe4B7ouD328RAgHxuAcJg97twGwzywGUOmdaZXDOObe31MY4Pkt8jMM555IrxxiHc865DPLA4ZxzLhEPHM455xLxwOGccy6Rihgcl7QN+HeCtzQA/00pO59VlVhmqMxyV2KZoTLL/UnLfJSZ7bVGQ0UEjqQktZZ6kiDLKrHMUJnlrsQyQ2WWO60ye1eVc865RDxwOOecS8QDR2m/LHcGyqASywyVWe5KLDNUZrlTKbOPcTjnnEvEWxzOOecS8cDhnHMuEQ8cBSRNkfR3SRskXVPu/KRF0hBJSyW9IullSZfF9AGSFkn6Z/zZv9x57WmSqiWtlvRE3G6StDLW+QNxuv5MkdRP0sOS1ktaJ2lc1uta0hXxd3utpPmSemWxriXNk/RmXBQvn1aybhXcHsu/RtKY/b2uB45IUjVwBzAVGAGcI2lEeXOVmnbgKjMbAYwFZseyXgMsNrPhwOK4nTWXAesKtm8Efmpmw4C3ge+XJVfpug140syOBb5MKH9m61rSIOAHQLOZjSQswXA22azr3wBTitK6q9uphLWNhhOW1b5zfy/qgaPLCcAGM3vVzHYDfwBOL3OeUmFmW8zs+fj6PcIXySBCee+Nh90LfKc8OUyHpMHAt4C747aAk4GH4yFZLHNfYCJh7RvMbLeZbSfjdU1Ya6h3XFm0D7CFDNa1mS0jrGVUqLu6PR34rQUrgH6SDt+f63rg6DIIeL1ge1NMyzRJjcBxwErgMDPbEne9ARxWpmyl5WfAXKAjbh8MbDez9ridxTpvArYBv45ddHdLqifDdW1mm4Gbgf8QAsY7wCqyX9d53dVtj33HeeCoYJIOBP4IXG5m7xbui0v4ZuZZbUmnAm+a2apy5+VTVgOMAe40s+OADyjqlspgXfcn3F03AUcA9ezdnVMR0qpbDxxdNgNDCrYHx7RMklRLCBr3m9kjMXlrvukaf75ZrvylYDxwmqSNhG7Ikwl9//1idwZks843AZvMbGXcfpgQSLJc15OB18xsm5m1AY8Q6j/rdZ3XXd322HecB44uzwHD45MXdYTBtJYy5ykVsW//HmCdmd1asKsFmBVfzwIe/7TzlhYzu9bMBptZI6Ful5jZucBS4Mx4WKbKDGBmbwCvSzomJk0CXiHDdU3oohorqU/8Xc+XOdN1XaC7um0BZsanq8YC7xR0aSXi/zleQNIphH7wamCemd1Q5iylQtIEYDnwEl39/dcRxjkeBI4kTEN/lpkVD7x97kk6CbjazE6VNJTQAhkArAamm9mucuavp0kaTXggoA54FTifcNOY2bqW9GNgGuEJwtXAhYT+/EzVtaT5wEmE6dO3Aj8CHqNE3cYg+gtCt90O4Hwza92v63rgcM45l4R3VTnnnEvEA4dzzrlEPHA455xLxAOHc865RDxwOOecS8QDh8sMSTlJL8QZURdI6pfCNa6RdG6J9KmSWuOMw6sl3fIx57le0tU9nb+PuF5VnBl1raSXJD0nqSnu+1Man5XLLg8cLks+NLPRcUbUt4DZKVzjm8CfCxMkjSQ8Hz89zjjcDGxI4dqfxDTC9BujzOxLwHeB7QBmdkqc+NC5feKBw2XVM8QJ3OJ/yt5UcLc9LabfIem0+PpRSfPi6wsk7fXPn5IOAurMbFvRrrnADWa2HsDMcmZ2Z3xPo6Qlcf2DxZKOLHHev0hqjq8b4rQoSDpP0mNxTYWNkuZIujK2aFZIGlDw/hslPSvpH5JOLPF5HA5sMbOOmMdNZvZ2fP/GeN2LY4vtBUmvSVoa939D0jOSnpf0UJzjzFUwDxwuc+LaKpPomjLmDGA0YS2KycBNcQ6f5UD+S3YQYR0WYtqyEqeeTFjfoNhIwuyrpfwcuNfMRgH3A7cnKkw49xnAV4AbgB1xssJngJkFx9WY2QnA5YT/Hi72IPDtGBRukXRc8QFmdpeZjY7X2gTcKqkB+CEw2czGAK3AlQnL4DLGA4fLkt6SXqBrKulFMX0CMD+2BLYCfyV8OS4HToyLWL1C1+Rw44CnS5x/CrAwYZ7GAb+Pr++LeUliqZm9F1s57wALYvpLQGPBcfmJKlcVpQOhhQEcA1xLmGZmsaRJ3VzzNsJcXgsIC32NAP4WP9tZwFEJy+AypubjD3Huc+NDMxstqQ/wFGGMo9s7fDPbHAeFpxBaGAOAs4D34wJXxU4ALimR/jJwPPDifua7na6buF5F+wrnUuoo2O5gz7/ffHqObv6u47xMC4GFkrYSFvjZowUl6TxCYJiTTwIWmdk5+1gWVwG8xeEyx8x2EJYOvSpOo70cmKaw3vghhBXxno2HryB07yyLx10df+5B0heB9WaWK3HJm4DrJB0dj62SdHHc9zRhNl6Ac0udG9hICDzQNXtrj5I0RtIR+fwBowgT4BUeczyh/NPzYyGEz2e8pGHxmPp8OV3l8sDhMsnMVgNrgHOAR+PrF4ElwNw43TiEL/IaM9sAPE9odZT6cp8KPNnNtdYQgs98SeuAtcDQuPtS4HxJa4AZhDXPi90MXCJpNWGW0zQcCiyQtJbwWbQTngQrNIdQ/qVxLOTu2EV2HqFsawhjK8emlEf3OeGz4zq3DyQtAmbu7/oFzmWJBw7nnHOJeFeVc865RDxwOOecS8QDh3POuUQ8cDjnnEvEA4dzzrlEPHA455xL5P9y/7ENwTa9oQAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "import time\n",
    "\n",
    "numpy_run_times = []\n",
    "for_loop_run_times = []\n",
    "\n",
    "matrix_sizes = range(1, 101)\n",
    "for size in matrix_sizes:\n",
    "    matrix = np.ones((size, size))\n",
    "    \n",
    "    start_time = time.time()\n",
    "    matrix @ matrix\n",
    "    numpy_run_times.append(time.time() - start_time)\n",
    "    \n",
    "    start_time = time.time()\n",
    "    for i in range(size):\n",
    "        for j in range(size):\n",
    "            matrix[i] @ matrix[:,j]\n",
    "            \n",
    "    for_loop_run_times.append(time.time() - start_time)\n",
    "    \n",
    "plt.plot(matrix_sizes, numpy_run_times, \n",
    "         label='NumPy Matrix Product', linestyle='--')\n",
    "plt.plot(matrix_sizes, for_loop_run_times, \n",
    "         label='For-Loop Matrix Product', color='k')\n",
    "plt.xlabel('Row / Column Size')\n",
    "plt.ylabel('Running Time (Seconds)')\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When it comes to matrix multiplication, NumPy greatly outperforms basic Python.  NumPy matrix product code is more efficient to run, and also to write. We will now use NumPy to compute our all-by-all text similarities with maximum efficiency.\n",
    "\n",
    "### 13.3.2. Computing All-By-All Matrix Similarities\n",
    "\n",
    "Running, `unit_vectors @ unit_vectors.T` returns a matrix of all-by-cosine similarities. The matrix should equal our previously computed `cosine_similarites` array. Lets confirm.\n",
    "\n",
    "**Listing 13. 45. Obtaining cosines from a matrix product**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [],
   "source": [
    "cosine_matrix = unit_vectors @ unit_vectors.T\n",
    "assert np.allclose(cosine_matrix, cosine_similarities)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each element in `cosine_matrix` equals the cosine of the the angle between 2 vectorized texts. That cosine can be transformed into a Tanimoto value, which generally reflects word overlap and divergence between texts. Using NumPy arithmetic, we can convert `cosine_matrix` into a Tanimoto similarity matrix, by running `cosine_matrix / (2 - cosine_matrix)`.\n",
    "\n",
    "**Listing 13. 46. Converting cosines to a Tanimoto matrix**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [],
   "source": [
    "tanimoto_matrix = cosine_matrix / (2 - cosine_matrix)\n",
    "assert np.allclose(tanimoto_matrix, similarities)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We've compute all the Tanimoto similarities in just 2 lines of code. We can also compute these similarities by inputting `unit_vectors` and `unit_vectors.T` directly into our `normalized_tanimoto` function.\n",
    "\n",
    "**Listing 13. 47. Inputting matrices into `normalized_tanimoto`**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [],
   "source": [
    "output = normalized_tanimoto(unit_vectors, unit_vectors.T)\n",
    "assert np.array_equal(output, tanimoto_matrix)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "== 13.4. Computational Limits of Matrix Multiplication\n",
    "\n",
    "Matrix multiplication speed is determined by matrix-size. NumPy may optimize for speed, but even NumPy has its limits. These limits become obvious when we compute real-world text-matrix products.\n",
    "\n",
    "Lets assume that 30 novels require a shared vocabulary containing 30,000 words. Furthermore, lets assume we take all-by-all similarity across the 30 books, How long will it take to compute these similarities? Lets find out! We'll create a 30-book by 50,000-word `book_matrix`. All rows within the matrix will be normalized. Afterwards, we'll measure the running-time of `normalized_tanimoto(book_matrix, book_matrix.T)`.\n",
    "\n",
    "**Listing 13. 48. Timing an all-by-all comparison of 30 novels**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "It took 0.0048 seconds to compute the similarities across a 30-book by 50000-word matrix\n"
     ]
    }
   ],
   "source": [
    "vocabulary_size = 50000\n",
    "normalized_vector = [1 / vocabulary_size] * vocabulary_size\n",
    "book_count = 30\n",
    "\n",
    "def measure_run_time(book_count):\n",
    "    book_matrix = np.array([normalized_vector] * book_count)\n",
    "    start_time = time.time()\n",
    "    normalized_tanimoto(book_matrix, book_matrix.T)\n",
    "    return time.time() - start_time\n",
    "\n",
    "run_time = measure_run_time(book_count)\n",
    "print(f\"It took {run_time:.4f} seconds to compute the similarities across a \"\n",
    "      f\"{book_count}-book by {vocabulary_size}-word matrix\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Will the running-time stay reasonable as the number of analyzed books continues to rise? Lets check below. We'll plot the running-times across multiple book-counts. The counts will range from 30 to nearly 1,000. For consistency's sake, we'll keep the vocabulary-size at 50,000.\n",
    "\n",
    "**Listing 13. 49. Plotting book-counts vs running-times**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEGCAYAAAB/+QKOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAeNklEQVR4nO3dfZRddX3v8ffH8DSIkACpF4aHBAwgykPqgGi4QHkKtRayKJZQ0VBR7lWgei3UsGwvQl1NLFblVq6SKhWpFxSKuUHEiAbQ9hbNhARCQiMhKGRQjEDEpSMm4Xv/2PuQk8meM78zOfs8fl5rnTVn//be53z3nOR8Z/8eFRGYmZmN9KpWB2BmZu3JCcLMzAo5QZiZWSEnCDMzK+QEYWZmhXZqdQCNsu+++8aUKVNaHYaZWUdZtmzZLyJictG+rkkQU6ZMYXBwsNVhmJl1FEk/GW2fq5jMzKyQE4SZmRVygjAzs0JOEGZmVsgJwszMCjlBmJlZIScIMzMr5ARhZmaFnCDMzKyQE4SZmRVygjAzs0JOEGZmVsgJwszMCjlBmJlZIScIMzMr1DXrQZiZdYKFy4e4bvEantk4zP4T+7hy5uHMmt7f6rAKOUGYmTXJwuVDXHXnSoY3bQFgaOMwV925EqAtk4SrmMzMmuS6xWteSQ4Vw5u2cN3iNS2KqDYnCDOzJnlm43Bd5a3mBGFm1iT7T+yrq7zVnCDMzJrkypmH07fzhG3K+naewJUzD29RRLW5kdrMrEkqDdHuxWRm1kNSu6/Omt7ftglhJCcIM7Md1GndV1O5DcLMbAd1WvfVVE4QZmY7qNO6r6ZygjAz20Gd1n01VakJQtJZktZIWitpbsH+D0taLekRSd+VdHDVvi2SVuSPRWXGaWa2Izqt+2qq0hqpJU0AbgDOANYDSyUtiojVVYctBwYi4jeS3g/8PXB+vm84Io4tKz4zs0bptO6rqcrsxXQ8sDYi1gFIug04B3glQUTEfVXHPwhcWGI8Zmal6aTuq6nKrGLqB56u2l6fl43mYuCequ3dJA1KelDSrDICNDOz0bXFOAhJFwIDwMlVxQdHxJCkQ4AlklZGxBMjzrsEuATgoIMOalq8Zma9oMw7iCHgwKrtA/KybUg6HfgocHZEvFQpj4ih/Oc64H5g+shzI2JBRAxExMDkyZMbG72ZWY8rM0EsBaZJmippF2A2sE1vJEnTgRvJksPPq8onSdo1f74vMIOqtgszMytfaVVMEbFZ0mXAYmACcFNErJJ0LTAYEYuA64A9gNslATwVEWcDrwdulPQyWRKbP6L3k5mZlUwR0eoYGmJgYCAGBwdbHYaZWUeRtCwiBor2eSS1mZkVcoIwM7NCThBmZlbICcLMzAo5QZiZWSEnCDMzK+QEYWZmhZwgzMyskBOEmZkVaovZXM3M2tnC5UNdtxhQCicIM7MaFi4f4qo7VzK8aQsAQxuHuerOlQBdnyRcxWRmVsN1i9e8khwqhjdt4brFa1oUUfOMeQch6VXAMcD+wDDwaPXU3GZm3eyZjcN1lXeTUROEpEOBjwCnA48DG4DdgMMk/YZsHYebI+LlZgRqZtYK+0/sY6ggGew/sa8F0TRXrSqmjwP/AhwaETMj4sKIOC8ijgbOBvYC3tWMIM3MWuXKmYfTt/OEbcr6dp7AlTMPb1FEzTPqHUREXFBj38+Bz5QSkZlZG6k0RLsXUwFJ7wC+FRG/kvQ3ZGtDfzwiHio9OjOzNjBren9PJISRUnox/U2eHE4ETgO+CHyu3LDMzKzVUsZBVPp3/RGwICLulvTxEmMyM9thvTq4rZFSEsSQpBuBM4BPSNoVj58wszbWDYPb2iHBpXzR/ymwGJgZERuBvYErS43KzGwHdPrgtkqCG9o4TLA1wS1cPtTUOEZNEJL2lrQ32diH+4Hn8u2XgMHmhGdmVr9OH9zWLgmuVhXTMiAAAQcBL+TPJwJPAVNLj87MbBw6fXBbuyS4Ue8gImJqRBwCfAf444jYNyL2Ad4OfLtZAZqZ1avTB7eNlsianeBS2iBOiIhvVjYi4h7greWFZGa2Y2ZN72feuUfRP7EPAf0T+5h37lEd00DdLgkupRfTM5L+mmzaDYB3As+kvLiks4DrgQnAFyJi/oj9HwbeC2wmm+vpPRHxk3zfHOCv80M/HhE3p7ynmRl09uC2dhm9rYiofUDWMH01cFJe9D3gmoh4fozzJgA/Iuseux5YClwQEaurjvkD4AcR8RtJ7wdOiYjz8/ccBAbI2kGWAW+KiBdGe7+BgYEYHHTbuZlZPSQti4iBon1j3kHkieCD43jf44G1EbEuD+I24BzglQQREfdVHf8gcGH+fCZwbyUJSboXOAu4dRxxmJl1pbLHSqTMxXQYcAUwpfr4iDh1jFP7gaerttcDb65x/MXAPTXO3e6qJV0CXAJw0EEHjRGOmVn3aMZgwJQ2iNuBzwNfYOu0Gw0l6UKy6qST6zkvIhYACyCrYiohNDOztlRrrEQzE8TmiBjP5HxDwIFV2wfkZduQdDrwUeDkiHip6txTRpx7/zhiMDPrSs0YK5HSzfUuSR+QtF9ldHXeiDyWpcA0SVMl7QLMBhZVHyBpOtnKdGePWMZ0MXCmpEmSJgFn5mVmZkZzxkqkJIg5ZHMv/T+y3kTLSJhqIyI2A5eRfbE/BnwtIlZJulbS2flh1wF7ALdLWiFpUX7u88DfkiWZpcC1Y/WaMjPrJc0YKzFmN9dO4W6uZtZrGtGLaYe6uUraGXg/W8dB3A/cGBGb6orCzMwaquzBgCmN1J8Ddgb+d779rrzsvWUFZWZmrZeSII6LiGOqtpdIerisgMzMrD2kNFJvkXRoZUPSIZQ0HsLMzNpHyh3ElcB9ktaRrQdxMPDnpUZlZmYtlzIX03clTQMqfafWVA1oMzOzLjVmFZOkS4G+iHgkIh4Bdpf0gfJDMzOzVkppg3hfRGysbORTbr+vvJDMzKwdpCSICZJU2cjXedilvJDMzKwdpDRSfwv4qqQb8+3/lpeZmVkXS0kQHyFLCu/Pt+8lm/rbzKyjlb3gTqdL6cX0sqQvAUsiYk35IZmZla8ZC+50upReTGcDK8irlSQdW5l11cysU9VacMcyKY3UV5OtL70RICJWAFPLDMrMrGzNWHCn06UkiE0R8csRZd0xR7iZ9axmLLjT6VISxCpJf0bW3XWapH8kWzzIzKxjNWPBnU6XkiAuB94AvATcCrwIfKjMoMzMyjZrej/zzj2K/ol9COif2Me8c49yA3WVulaUy9eH3hhtuAydV5QzM6tfrRXlRr2DkPQ/JR2RP99V0hJgLfCspNPLCdXMzNpFrSqm84FKf685+bG/B5wM/F3JcZmZWYvVShC/q6pKmgncGhFbIuIx0kZgm5lZB6uVIF6S9EZJk4E/AL5dtW/3csMyM7NWq3Un8CHgDmAy8OmIeBJA0tuA5U2IzczMWmjUBBERDwJHFJR/E/hmmUGZmY3GE+w1T61eTBdWrwNRsP9QSSeWE5aZ2fYqE+wNbRwm2DrB3sLlQ60OrSvVaoPYB1gh6SZJl0r6U0nvlnStpAeAvweerfXiks6StEbSWklzC/afJOkhSZslnTdi3xZJK/KHJwc0M0+w12S1qpiul/RZ4FRgBnA0MAw8BrwrIp6q9cL5ynM3AGcA64GlkhZFxOqqw54CLgKuKHiJ4Yg4to5rMbMu5wn2mqtmd9WI2EK2QNC943jt44G1EbEOQNJtwDnAKwkiIn6c73t5HK9vZj1m/4l9DBUkA0+wV46UuZjGqx94ump7fV6WajdJg5IelDSr6ABJl+THDG7YsGFHYjWzDuAJ9pqrnQe8HRwRQ5IOAZZIWhkRT1QfEBELgAWQzcXUiiDNrHkqvZXci6k5ykwQQ8CBVdsH5GVJImIo/7lO0v3AdOCJmieZWdebNb3fCaFJUpYcfa2kL0q6J98+UtLFCa+9FJgmaaqkXYDZQFJvJEmTJO2aP9+XrJF8de2zzMyskVLaIL4ELAb2z7d/RMJ6EBGxGbgsP/cx4GsRsSrvJns2gKTjJK0H3gHcKGlVfvrrgUFJDwP3AfNH9H4ysy60cPkQM+YvYercu5kxf4nHN7TYmOtBSFoaEcdJWh4R0/OyFe3WBdXrQZh1tsoguOpxDn07T/AiPiUb13oQVX4taR/ydaglnQCMXKPazGyHeBBc+0lppP4wWdvBoZL+nWzyvvNqn2JmVh8Pgms/YyaIiHhI0snA4YCANRGxqfTIzKyneBBc+0npxTQBeBtwGnAmcLmkD5cdmJn1Fg+Caz8pVUx3Ab8FVgKeEsPMSuFBcO0nJUEcEBFHlx6JmfU8D4JrLym9mO6RdGbpkZiZWVtJuYN4EPi6pFcBm8gaqiMi9iw1MjMza6mUBPEp4C3AyhhrVJ2ZmXWNlCqmp4FHnRzMzHpLyh3EOuD+fLK+lyqFEfGp0qIyM7OWS0kQT+aPXfKHmVldFi4fcvfVDpQykvqaZgRiZt1p5CR8QxuHuerOlQBOEm1u1AQh6TMR8SFJd5FP1FctIs4uNTIz6wq1JuFzgmhvte4gbsl/frIZgZhZd/IkfJ2rVoK4HLgoIh5oVjBm1n08CV/nqtXN1dNrmNkO8yR8navWHcTukqaTjZzeTkQ8VE5IZtZJxuqh5En4OletBNEP/APFCSKAU0uJyMw6RmoPJU/C15lqJYi1EeEkYGajcg+l7pYyUM7MelDK4Db3UOputRqpP9K0KMysrVSqjoY2DhNsrTpauHxom+NG64nkHkrdYdQEERHfbmYgZtY+alUdVXMPpe7mKiYz205q1ZF7KHU3Jwgz2049g9vcQ6l7jbkehKS7JC0a8bhF0gcl7TbGuWdJWiNpraS5BftPkvSQpM2Szhuxb46kx/PHnPovzaz3LFw+xIz5S5g6925mzF+yXZtBKlcdGaSvBzEZuDXfPh/4FXAY8E/Au4pOkjQBuAE4A1gPLJW0KCJWVx32FHARcMWIc/cGrgYGyMZcLMvPfSHtssx6TyNnTXXVkUFagnhrRBxXtX2XpKURcZykVTXOO55sLMU6AEm3AecArySIiPhxvu/lEefOBO6NiOfz/fcCZ7E1SZnZCI0ek+CqI0tZcnQPSQdVNvLne+Sbv6txXj/ZcqUV6/OyFEnnSrpE0qCkwQ0bNiS+tFl38pgEa7SUBPGXwL9Juk/S/cD3gSskvRq4uczgxhIRCyJiICIGJk+e3MpQzFrOYxKs0cZMEBHxTWAa8CHgg8DhEXF3RPw6Ij5T49Qh4MCq7QPyshQ7cq5ZT3LDsjVaajfXNwFT8uOPkUREfHmMc5YC0yRNJftynw38WeL7LQb+TtKkfPtM4KrEc816khuWrdHGTBCSbgEOBVYAlRawAGomiIjYLOkysi/7CcBNEbFK0rXAYEQsknQc8HVgEvDHkq6JiDdExPOS/pYsyQBcW2mwNrPRpTYsp8yzZKaI7Zab3vYA6THgyBjrwBYbGBiIwcHBVodhVppGfamP7A4LWVXUvHOPcpLoQZKWRcRA0b6URupHgf/S2JDMrB6pk+elSJ1nySylDWJfYLWkHwIvVQoj4uzSojKzbTRyjIO7w1qqlATxsbKDMLPaGvmlXs88S9bbxkwQEfFAMwIxs9E18kv9ypmHF7ZBuDusjTRqG4Skf8t//krSi1WPX0l6sXkhmlkjxzjMmt7PvHOPon9iHwL6J/a5gdoKjXoHEREn5j9f07xwzKxIo8c4eJ4lS5E0UC6fmfW11cdHxFNlBWVm2/OXujVbykC5y8mm3n4WqMy6GsDRJcZlZmYtlnIHUZl/6bmygzHrRR7VbO0qJUE8Dfyy7EDMutFYX/6NXOTHrNFSV5S7X9LdbDtQ7lOlRWXWBVK+/Bu9yI9ZI6VMtfEUcC+wC/CaqoeZ1ZAypYVHNVs7Sxkod00zAjHrNilf/h7VbO1szDsISYdJWiDp25KWVB7NCM6sk6Ws8OZFfqydpbRB3A58HvgCW9eDMOtpKT2PUqa08CI/1s5SEsTmiPhc6ZGYdYjUnkepX/4eAGftKiVB3CXpA2Qrv1X3YvIKb9aT6ul55C9/62QpCWJO/vPKqrIADml8OGbtzz2PrFek9GKa2oxAzDqFex5Zr0iZi+ndReUR8eXGh2PWemM1QHs9BesVKVVMx1U93w04DXgIcIKwrpPSAO2eR9YrUqqYLq/eljQRuK20iMxaKLUB2o3P1gtSptoY6deA2yWsK7kB2myrlDaIu8h6LUGWUI4kGzxn1lFSBre5Adpsq5Q2iE9WPd8M/CQi1pcUj1kpUge3uQHabKsxq5gi4oGqx78Dz0h6Z8qLSzpL0hpJayXNLdi/q6Sv5vt/IGlKXj5F0rCkFfnj83Vel/WYhcuHmDF/CVPn3s2M+UtYuHxom/0pM6tClizmnXsU/RP7ENA/sY955x7l9gbrSaPeQUjaE7gU6AcWkU35fSlwBfAw8JVaL5yvY30DcAawHlgqaVFErK467GLghYh4naTZwCeA8/N9T0TEseO6KuspKXcH9bQtuAHaLFPrDuIW4HBgJfBe4D7gHcCsiDgn4bWPB9ZGxLqI+B1Zz6eR550D3Jw/vwM4TZLqiN+63Fh3BpB2d5Ays6qZbatWgjgkIi6KiBuBC8gap2dGxIrE1+4nW660Yn1eVnhMRGwmW9p0n3zfVEnLJT0g6b8WvYGkSyQNShrcsGFDYljWKSp3BkMbhwm23hmMTBIpdweeVtusfrUSxKbKk4jYAqyPiN+WHxIAPwUOiojpwIeB/5NXeW0jIhZExEBEDEyePLlJoVmzpLYbpNwduG3BrH61ejEdI+nF/LmAvnxbQETEdl/YIwwBB1ZtH5CXFR2zXtJOwF7AcxER5DPHRsQySU8AhwGDCddkXSK13SC155HbFszqM+odRERMiIg988drImKnqudjJQeApcA0SVMl7QLMJmvsrraIrbPFngcsiYiQNDlv5EbSIcA0YF29F2edLbXdwHcHZuVIGQcxLhGxWdJlwGJgAnBTRKySdC0wGBGLgC8Ct0haCzxPlkQATgKulbQJeBn4715/ovfUMybBdwdmjaesNqfzDQwMxOCga6DaQcqI5dTjUl/LzMZH0rKIGCjaV9odhPWm1BHL9Szb6YRg1hrjmazPbFSpPY9SjzOz1nGCsIZK7XnkWVPN2p+rmKwuY7UJpM6G6llTzdqf7yAsWcrI5tQRyx7ZbNb+nCAsWUq7QeqYBI9dMGt/rmKyZKntBqk9j9xDyay9+Q7CknlGVLPe4juIHtCoAWlebc2stzhBdLmUAWn1DFoDPLLZrEc4QXS5Wg3L1V/4Yx1T4XYDs97hNogul9Kw7EFrZlbECaLLpTQsu/HZzIo4QXS5lAFpHrRmZkXcBtHlUhqW3fhsZkW8HkSb8loJZtYMXg+iwzSya6qZ2Xi5DaINpcx55PUUzKxsvoNospRqIXdNNbN24ASRoFF1/anVQilrJXg9BTMrm6uYxpCyBkL1sTPmL2Hq3LuZMX/JdsekVgu5a6qZtQMniDGkfqmnJJJ6pssea60Er6dgZmXr+SqmsaqPUr/UU+YzqqdaKGXOI8+LZGZl6uk7iJS/+lOnoUhJJK4WMrNO0tMJIqX6KPVLPSWRuFrIzDpJqVVMks4CrgcmAF+IiPkj9u8KfBl4E/AccH5E/DjfdxVwMbAF+IuIWNzo+FL+6k+dhiJ1MR1XC5lZpygtQUiaANwAnAGsB5ZKWhQRq6sOuxh4ISJeJ2k28AngfElHArOBNwD7A9+RdFhEbPvn/g5KbRNIbQ8Az2dkZt2jzDuI44G1EbEOQNJtwDlAdYI4B/hY/vwO4LOSlJffFhEvAU9KWpu/3n80MsBGL6HpuwMz6yZltkH0A09Xba/PywqPiYjNwC+BfRLPRdIlkgYlDW7YsKHuAN0mYGY2uo7u5hoRC4AFkM3mOp7X8F/9ZmbFyryDGAIOrNo+IC8rPEbSTsBeZI3VKeeamVmJykwQS4FpkqZK2oWs0XnRiGMWAXPy5+cBSyJboGIRMFvSrpKmAtOAH5YYq5mZjVBaFVNEbJZ0GbCYrJvrTRGxStK1wGBELAK+CNySN0I/T5ZEyI/7GlmD9mbg0kb3YDIzs9q8opyZWQ+rtaJcT4+kNjOz0TlBmJlZIScIMzMr5ARhZmaFnCDMzKyQE4SZmRVygjAzs0JOEGZmVsgJwszMCjlBmJlZoa6ZakPSBuAnBbv2BX7R5HDaSS9fv6+9d/Xy9dd77QdHxOSiHV2TIEYjaXC0eUZ6QS9fv6+9N68devv6G3ntrmIyM7NCThBmZlaoFxLEglYH0GK9fP2+9t7Vy9ffsGvv+jYIMzMbn164gzAzs3FwgjAzs0JdnSAknSVpjaS1kua2Op5Gk3SgpPskrZa0StIH8/K9Jd0r6fH856S8XJL+V/77eETS77f2CnacpAmSlkv6Rr49VdIP8mv8qqRd8vJd8+21+f4prYy7ESRNlHSHpP+U9Jikt/TKZy/pf+T/5h+VdKuk3br5s5d0k6SfS3q0qqzuz1rSnPz4xyXNGet9uzZBSJoA3AD8IXAkcIGkI1sbVcNtBv4yIo4ETgAuza9xLvDdiJgGfDffhux3MS1/XAJ8rvkhN9wHgceqtj8BfDoiXge8AFycl18MvJCXfzo/rtNdD3wrIo4AjiH7PXT9Zy+pH/gLYCAi3ghMAGbT3Z/9l4CzRpTV9VlL2hu4GngzcDxwdSWpjCoiuvIBvAVYXLV9FXBVq+Mq+Zr/L3AGsAbYLy/bD1iTP78RuKDq+FeO68QHcED+H+NU4BuAyEaQ7jTy3wCwGHhL/nyn/Di1+hp24Nr3Ap4ceQ298NkD/cDTwN75Z/kNYGa3f/bAFODR8X7WwAXAjVXl2xxX9OjaOwi2/iOqWJ+XdaX8tnk68APgtRHx03zXz4DX5s+77XfyGeCvgJfz7X2AjRGxOd+uvr5Xrj3f/8v8+E41FdgA/HNexfYFSa+mBz77iBgCPgk8BfyU7LNcRu989hX1ftZ1/xvo5gTRMyTtAfwr8KGIeLF6X2R/KnRdX2ZJbwd+HhHLWh1Li+wE/D7wuYiYDvyarVUMQFd/9pOAc8iS5P7Aq9m++qWnlPVZd3OCGAIOrNo+IC/rKpJ2JksOX4mIO/PiZyXtl+/fD/h5Xt5Nv5MZwNmSfgzcRlbNdD0wUdJO+THV1/fKtef79wKea2bADbYeWB8RP8i37yBLGL3w2Z8OPBkRGyJiE3An2b+HXvnsK+r9rOv+N9DNCWIpMC3v2bALWSPWohbH1FCSBHwReCwiPlW1axFQ6aEwh6xtolL+7ryXwwnAL6tuUTtKRFwVEQdExBSyz3ZJRLwTuA84Lz9s5LVXfifn5cd37F/XEfEz4GlJh+dFpwGr6YHPnqxq6QRJu+f/ByrX3hOffZV6P+vFwJmSJuV3YWfmZaNrdcNLyY06bwN+BDwBfLTV8ZRwfSeS3VY+AqzIH28jq1/9LvA48B1g7/x4kfXsegJYSdYLpOXX0YDfwynAN/LnhwA/BNYCtwO75uW75dtr8/2HtDruBlz3scBg/vkvBCb1ymcPXAP8J/AocAuwazd/9sCtZO0tm8juHi8ez2cNvCf/PawF/nys9/VUG2ZmVqibq5jMzGwHOEGYmVkhJwgzMyvkBGFmZoWcIMzMrJAThPUkSVskrZD0sKSHJL11nK9zSmUm2TGOO17S95TNLlyZGmP38bxnjfe4SNL+jXxN6207jX2IWVcajohjASTNBOYBJ5fxRpJeS9YPf3ZE/Ededh7wGuA3DXyri8jGBTzTwNe0HuY7CDPYk2x66Mpc+tfl6wyslHR+rfJqko7L7w4OHbHrUuDmSnIAiIg7IuLZfE7/hfm8/Q9KOjp/rY9JuqLqtR+VNCV/PCbpn5Sth/BtSX15whkAvpLfGfU1/LdkPcd3ENar+iStIBtlux/ZXE4A55KNUD4G2BdYKul7wFtHKQcgr6L6R+CciHhqxHu9Ebh5lDiuAZZHxCxJpwJfzt+nlmlk0zS/T9LXgD+JiH+RdBlwRUQMJly/2ZicIKxXVVcxvQX4sqQ3kk1fcmtEbCGbDO0B4Lga5S8CrwcWAGdGRL3VOycCfwIQEUsk7SNpzzHOeTIiVuTPl5GtE2DWcK5isp6XV/3sC0we50v8FPgt2XocRVYBb6rzNTez7f/P3aqev1T1fAv+Q89K4gRhPU/SEWTLVj4HfB84X9la15OBk8gmeButHGAj8EfAPEmnFLzFZ4E5kt5c9Z7n5o3X3wfemZedAvwisjU9fkw2fTfK1hSemnApvyJr+DZrCP/lYb2q0gYB2eyXcyJii6Svky1X+TDZTLl/FRE/q1F+BEDe4Px24B5J74mt6zRU9s0GPinp98hWwPse8C3gY8BNkh4h69FUmb75X8mmbF5FtkrgjxKu6UvA5yUNky2xOTy+X41ZxrO5mplZIVcxmZlZIScIMzMr5ARhZmaFnCDMzKyQE4SZmRVygjAzs0JOEGZmVuj/A/41wZI4LKusAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "book_counts = range(30, 1000, 30)\n",
    "run_times = [measure_run_time(book_count) \n",
    "             for book_count in book_counts]    \n",
    "plt.scatter(book_counts, run_times)\n",
    "plt.xlabel('Book Count')\n",
    "plt.ylabel('Running Time (Seconds)')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The similarity running-time rises quadratically with book-counts. Our plotted curve takes on a parabolic shape defined by `y = n * (x ** 2)`.\n",
    "\n",
    "**Listing 13. 50. Modeling running-times using a quadratic curve**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEGCAYAAAB/+QKOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO3de5yPdf7/8cfL5FQSciinUEpthZpUtJ2dUqavFdkOikIHHZF27WZ1tLXFb7eTjbZs0VKJiMqQrJRxKKUUOhkrRZNiiPH6/fG5Rh/TZ2auGZ/PfObwvN9un9t8rut6X9fndc3F5zXX9T6ZuyMiIpJXpWQHICIipZMShIiIxKQEISIiMSlBiIhITEoQIiIS0wHJDiBe6tat682aNUt2GCIiZcrSpUu/c/d6sbaVmwTRrFkzMjIykh2GiEiZYmZf5rdNj5hERCQmJQgREYlJCUJERGJSghARkZiUIEREJCYlCBERiUkJQkREYlKCEBGRmJQgREQkJiUIERGJSQlCRERiUoIQEZGYlCBERCQmJQgREYlJCUJERGIqN/NBiIiUBdOWZ/LgnNVsyMqmYa3qDO18DBe3bZTssGJSghARKSHTlmdy50sryd6VA0BmVjZ3vrQSoFQmCT1iEhEpIQ/OWb03OeTK3pXDg3NWJymigilBiIiUkA1Z2UVan2xKECIiJaRhrepFWh/Gt99+y+bNm4u9f0GUIERESsjQzsdQvXLKPuuqV05haOdjinW8bdu2ceGFF9KxY0f27NkTjxD3oUpqEZESklsRHY9WTLt376Z3795kZGTw8ssvU6lS/P/eV4IQEYmDsM1XL27baL9bLLk71113HTNnzuSJJ56ge/fu+3W8/ChBiIjsp5Juvjpq1CieeuopRowYwcCBA+N+/FyqgxAR2U8l2Xx1/PjxjBw5kquuuopRo0bF/fjRlCBERPZTSTVfnTlzJgMHDqRLly6MGzcOM4vr8fNSghAR2U+JaL6a13vvvUevXr1o06YNU6ZMoXLlynE7dn4SmiDMrIuZrTazNWY2PMb228xslZl9YGZzzeyIqG05ZrYieE1PZJwiIvsj3s1X81qzZg3dunWjQYMGzJw5kxo1asTluIVJWCW1maUAjwIdgfXAEjOb7u6roootB1LdfbuZXQf8FegdbMt29zaJik9EJF7i2Xw1r02bNtGlSxcA5syZQ4MGDfb7mGElshVTO2CNu68DMLPJQBqwN0G4+7yo8ouByxMYj4hIwsSj+WpeP/30E926dWPDhg3MmzePli1bxvX4hUnkI6ZGwNdRy+uDdfnpD7wWtVzNzDLMbLGZXZyIAEVESqtdu3bRu3dvli1bxgsvvMCpp55a4jGUin4QZnY5kAqcFbX6CHfPNLMWQLqZrXT3tXn2GwAMAGjatGmJxSsikkh79uzhmmuuYdasWTz55JNcdNFFSYkjkXcQmUCTqOXGwbp9mNn5wB+B7u6+M3e9u2cGP9cB84G2efd193HunuruqfXq1Ytv9CIiSeDuDBs2jGeffZZRo0YxYMCApMWSyASxBGhpZs3NrApwKbBPayQzaws8SSQ5bIpaX9vMqgbv6wIdiKq7EBEpr/7617/yt7/9jcGDBzNixIikxpKwR0zuvtvMbgTmACnABHf/yMxGARnuPh14EKgBTAk6fHzl7t2BY4EnzWwPkST2QJ7WTyIi5c748eMZPnw4ffr0YcyYMQnvCFcYc/ekBhAvqampnpGRkewwRESK5eWXX6Znz5507NiR6dOnU6VKlRL5XDNb6u6psbapJ7WISJLNnz+fPn360K5dO1588cUSSw6FUYIQEUmiZcuW0b17d4488khmzpzJQQcdlOyQ9lKCEBFJks8++4wuXbpQu3Zt5syZQ506dZId0j6UIEREkmDDhg106tQJd+eNN96gcePGyQ7pV0pFRzkRkYrk+++/p3Pnznz33XfMnz+fo48+OtkhxaQEISJSgnLHV/r000957bXXOPnkk5MdUr6UIERESsiOHTtIS0vj3XffZcqUKZx77rnJDqlAShAiIiVg165d9OrVi/T0dJ555hl69OiR7JAKpUpqEZEEy8nJ4corr2TGjBk8+uijXHnllckOKRTdQYiIFGLa8sxiTwbk7gwaNIjJkyczevRorr/++gRHGz9KECIiBZi2PJM7X1pJ9q4cADKzsrnzpZUAhSYJd+e2227jqaee4o9//CPDhg1LeLzxpEdMIiIFeHDO6r3JIVf2rhwenLO60H1HjhzJmDFjuOmmm7j77rsTFWLCFHoHYWaVgNZAQyAb+DB6aG4RkfJsQ1Z2kdbneuihhxg1ahT9+vXjkUceSfrIrMWRb4IwsyOBO4Dzgc+Ab4FqwNFmtp3IPA7PuPuekghURCQZGtaqTmaMZNCwVvV893nyyScZOnQovXr1Yty4cVSqVDYf1hQU9T3Av4Ej3b2zu1/u7j3d/USgO3AIcEVJBCkikixDOx9D9cop+6yrXjmFoZ2PiVn+3//+N9dddx3dunVj4sSJpKSkxCxXFuR7B+HufQrYtgkYk5CIRERKkdyK6DCtmF566SWuuuoqzj77bKZMmVJqhu0urjB1EJcAs939RzP7E5G5oe9x92UJj05EpBS4uG2jQlssTZ8+nd69e9OuXTteeeUVqlfP/xFUWRHmwdifguRwBnAeMB54PLFhiYiUHbNmzaJnz56cdNJJzJ49m4MPPjjZIcVFmH4Que27ugHj3H2mmd2TwJhERPbb/nRuK4rXX3+dHj16cOKJJzJnzhxq1qwZ989IljAJItPMngQ6AqPNrCrqPyEipdj+dG4rirlz55KWlkarVq14/fXXqVWrVtyOXVIJriBhvuh7AXOAzu6eBdQBhiY0KhGR/bA/ndvCeuutt7jooos46qijePPNN+M6G1xugsvMysb5JcFNW54Zt88II98EYWZ1zKwOkb4P84HNwfJOIKNkwhMRKbridm4La+HChXTr1o1mzZoxd+5c6tatG5fj5iqJBBdGQY+YlgIOGNAU+D54Xwv4Cmie8OhERIqhOJ3bwlq8eDEXXHABjRo1Yu7cudSvX3+/j5lXohNcWPneQbh7c3dvAbwJXOTudd39UOBC4PWSClBEpKiK2rktrIyMDDp37kz9+vVJT0/n8MMP36/j5Se/RBaPBFcUYeogTnP3WbkL7v4a0D5xIYmI7J+L2zbi/h4n0KhWdQxoVKs69/c4Yb8qeZcvX07Hjh059NBDmTdvHo0aJa7COFEJrqjCtGLaYGYjiAy7AXAZsCHMwc2sCzAWSAGecvcH8my/DbgG2E1krKd+7v5lsK0vMCIoeo+7PxPmM0VEIFzntrCWLVvG+eefT82aNUlPT6dJkyZxOW5+itJ7O5HM3QsuEKmYvgs4M1i1APiLu28pZL8U4FMizWPXA0uAPu6+KqrMOcC77r7dzK4Dznb33sFnZgCpROpBlgInu/v3+X1eamqqZ2So7lxE4isjI4OOHTtyyCGHMG/ePJo3L1/Vr2a21N1TY20r9A4iSAQ3F+Nz2wFr3H1dEMRkIA3YmyDcfV5U+cXA5cH7zsAbuUnIzN4AugCTihGHiEixvPvuu3Tu3Jk6deqQnp5Os2bNkh3SPhLdVyLMWExHA0OAZtHl3f3cQnZtBHwdtbweOLWA8v2B1wrY91dnbWYDgAEATZs2LSQcEZHw3nnnHTp37ky9evWYN29eqfuOKYnOgGHqIKYATwBP8cuwG3FlZpcTeZx0VlH2c/dxwDiIPGJKQGgiUgEtXLiQrl27cthhhzFv3jwaN26c7JB+paC+EiWZIHa7e3EG58sEomtyGgfr9mFm5wN/BM5y951R+56dZ9/5xYhBRKRIFixYsLefw7x582jYsGGyQ4qpJPpKhGnmOsPMrjezw3N7VweVyIVZArQ0s+ZmVgW4FJgeXcDM2hKZma57nmlM5wCdzKy2mdUGOgXrREQSZv78+XTt2pUmTZowf/78UpscoGT6SoRJEH2JjL20iEhroqWEGGrD3XcDNxL5Yv8Y+I+7f2Rmo8yse1DsQaAGMMXMVpjZ9GDfLcDdRJLMEmBUYa2mRET2x9y5c7ngggto1qwZ8+fPT1gnuHgpib4ShTZzLSvUzFVEiuv1118nLS2Nli1b8uabbyZk+IxEiEcrpv1q5mpmlYHr+KUfxHzgSXffVaQoRERKoVmzZtGjRw9atWrFm2++GfeB9xIpnp0BYwnziOlx4GTgseB1MppRTkTKgSlTppCWlsbxxx+fkFFZy7owrZhOcffWUcvpZvZ+ogISESkJ//rXv+jfvz/t27fn1Vdf5ZBDDkl2SKVOmDuIHDM7MnfBzFqQoP4QIiIl4dFHH+Xqq6/mvPPOY/bs2UoO+QhzBzEUmGdm64jMB3EEcHVCoxIRSZDRo0czfPhwunfvzgsvvEC1atWSHVKpFWYsprlm1hLIbTu1OqpDm4hImeDu/PnPf+aee+6hT58+PPPMM1SuXDnZYZVqhT5iMrMbgOru/oG7fwAcaGbXJz40EZH4cHduvfVW7rnnHq655homTpyo5BBCmDqIa909K3chGHL72sSFJCISPzk5OVx77bWMHTuWW265hXHjxpGSklL4jhIqQaSYmeUuBPM8VElcSCIi8bFr1y4uv/xyxo8fz5/+9Ccefvhhor7OpBBhKqlnAy+Y2ZPB8sBgnYhIqZWdnU3v3r2ZMWMGo0ePZtiwYckOqcwJkyDuIJIUrguW3yAy9LeISKmUlZVF9+7dWbhwIY899hjXXXddzHKJnnCnrAvTimmPmf0LSHf31YkPSUSk+DZu3EiXLl1YtWoVkydPplevXjHLlcSEO2VdmFZM3YEVBI+VzKxN7qirIiKlydq1a+nQoQNr1qxh5syZ+SYHKHjCHYkIU0l9F5H5pbMA3H0FUL5m7RaRMm/FihV06NCBH374gfT0dDp27Fhg+ZKYcKesC5Mgdrn7D3nWlY8xwkWkXFiwYAFnnXUWlStX5u2336Zdu3aF7lMSE+6UdWESxEdm9nsizV1bmtnfiUweJCKSdK+88gqdOnWiYcOGLFq0iGOPPTbUfiUx4U5ZFyZBDAZ+A+wEJgFbgVsSGZSISBhPP/00PXr0oHXr1rz99ts0adIk9L4Xt23E/T1OoFGt6hjQqFZ17u9xgiqooxRpRrlgfugsL4XT0GlGOZGK5cEHH2TYsGF07NiRl156iRo1aiQ7pDKpoBnl8r2DMLM/m1mr4H1VM0sH1gDfmNn5iQlVRKRge/bs4dZbb2XYsGF7O8IpOSRGQY+YegO57b36BmXrA2cB9yU4LhGRX9mxYwe9e/dmzJgxDB48mOeee46qVasmO6xyq6COcj9HPUrqDExy9xzgYzML0wNbRCRutmzZQlpaGgsXLuRvf/sbt956q8ZVSrCCvuh3mtnxwDfAOcCQqG0HJjQqEZEoX3zxBV27dmXdunVMnjyZ3r17JzukCqGgBHELMBWoBzzi7p8DmNkFwPISiE1EhGXLltGtWzd27NjBG2+8wZlnnpnskCqMfBOEuy8GWsVYPwuYlcigREQAZs+eTc+ePTn00EOZO3cuxx13nAbYK0EFtWK63Ap4wGdmR5rZGYkJS0QqugkTJnDhhRfSsmVL3nnnnb3J4c6XVpKZlY3zywB705ZnJjvccqmgVkyHAivMbIKZ3WBmvczsSjMbZWZvAX8lUj+RLzPrYmarzWyNmQ2Psf1MM1tmZrvNrGeebTlmtiJ4aXBAkQrC3Rk5ciT9+/fnvPPOY8GCBTRs2BDQAHslraBHTGPN7B/AuUAH4EQgG/gYuMLdvyrowMHMc48CHYH1wBIzm+7uq6KKfQVcxb4V4Lmy3b1NEc5FRMq4n3/+mUGDBvH0009z9dVX8+STT+4zd7QG2CtZBTZXDZq1vhG8iqodsMbd1wGY2WQgDdibINz9i2DbnmIcX0TKkS1btvC73/2O+fPnc9ddd3HXXXf9qhlrw1rVyYyRDDTAXmKEGYupuBoBX0ctrw/WhVXNzDLMbLGZXRyrgJkNCMpkfPvtt/sTq4gk0WeffcZpp53GokWL+Pe//83IkSNj9nHQAHslqzR3eDvC3TPNrAWQbmYr3X1tdAF3HweMg8hYTMkIUkT2z/z58+nRowcpKSmkp6fToUOHfMvmtlZSK6aSkcgEkQlED63YOFgXirtnBj/Xmdl8oC2wtsCdRKRMefrppxk4cCBHHXUUr776Ki1atCh0n4vbNlJCKCFhphxtYGbjzey1YPk4M+sf4thLgJZm1tzMqgCXAqFaI5lZbTOrGryvS6SSfFXBe4lIWbFnzx6GDx9Ov379OPvss1m0aFGo5CAlK0wdxL+AOUDDYPlTQswH4e67gRuDfT8G/uPuHwXNZLsDmNkpZrYeuAR40sw+CnY/Fsgws/eBecADeVo/iUgZtX37di655BJGjx7NoEGDmDlzJrVq1QJg2vJMOjyQTvPhM+nwQLr6NyRZofNBmNkSdz/FzJa7e9tg3YrS1gRV80GIlH4bNmyge/fuLFu2jIcffpibb755b2V0bie46H4O1SunaBKfBCvWfBBRtpnZoQTzUJvZaUDeOapFRAq0bNkyTj31VD755BOmT5/OLbfcsk9LJXWCK33CVFLfRqTu4Egz+y+Rwft6FryLiMgvJk+eTL9+/ahbty7//e9/ad269a/KqBNc6VPoHYS7LyMySVB7YCDwG3f/INGBiUjZl5OTw5133kmfPn1ITU0lIyMjZnKA/Du7qRNc8oRpxZQCXACcB3QCBpvZbYkOTETKth9++IHu3bvzwAMPMGjQIN58803q16+fb3l1git9wjximgHsAFYCGhJDRAq1evVq0tLSWLt2LY8//jiDBg0qdB91git9wiSIxu5+YsIjEZFyYdasWfTp04eqVauSnp7Ob3/729D7qhNc6RKmFdNrZtYp4ZGISJnm7owePZoLL7yQFi1asGTJkiIlByl9wtxBLAZeNrNKwC7AAHf3mgmNTETKjO3bt9O/f/+980VPmDCBAw/U1PVlXZg7iIeB04ED3b2mux+s5CAiudatW0eHDh144YUXuP/++5k0aZKSQzkR5g7ia+BDL6zLtYhUOLNmzeKyyy4DYMaMGXTr1i3JEUk8hUkQ64D5wWB9O3NXuvvDCYtKREq1nJwcRo0axahRo2jTpg0vvviiBtsrh8IkiM+DV5XgJSIV2ObNm7nsssuYM2cOffv25fHHH6d69YI7s01bnqnmq2VQoQnC3f9SEoGISOmXkZFBz549+d///scTTzzBgAEDYs78Fi3vIHyZWdnc+dJKACWJUi7fSmozGxP8nGFm0/O+Si5EESkNnnrqKTp06MCePXtYuHAhAwcOLDQ5gAbhK8sKuoOYGPx8qCQCEZHSKTs7mxtvvJEJEybQsWNHnn/+eerWrRt6fw3CV3YVlCAGA1e5+1slFYyIlC7r1q3jkksuYdmyZYwYMYKRI0eSkpJS+I5RGtaqTmaMZKBB+Eq/gvpBaHgNkQps6tSptG3blrVr1zJ9+nTuvvvuIicH0CB8ZVlBdxAHmllbIj2nfyUYBlxEypkdO3Zw++2389hjj9GuXTteeOEFmjVrlm/5wlooaRC+squgBNEI+BuxE4QD5yYkIhFJmk8//ZRevXrx/vvvc/vtt3PfffdRpUr+rdvDtlDSIHxlU0EJYo27KwmIVBDPP/88AwcOpEqVKsyYMYMLL7yw0H0KaqGkhFD2hekoJyLl2Pbt27npppsYP348Z5xxBpMmTaJx48ahOrephVL5VlAl9R0lFoWIJMWqVato164dEyZM4A9/+APz5s3bmxzufGklmVnZOL88Opq2PHOf/TVNaPmWb4Jw99dLMhARKTnuzoQJE0hNTWXTpk3Mnj2be++9lwMOiDxUCNu5TS2Uyjc9YhKpYLZs2cLAgQOZOnUq55xzDs899xyHH374PmXCPjpSC6XyTQlCpAJJT0/nyiuv5JtvvuGBBx5gyJAhMfs2FKVzm1oolV+FThiUz1hME83sZjOrVsi+XcxstZmtMbPhMbafaWbLzGy3mfXMs62vmX0WvPoW/dREKp5pyzPp8EA6zYfPpMMD6XvrDHbu3MmwYcM4//zzqVGjBosXL+aOO+7It+ObHh0JhJ8Poh4wKVjuDfwIHA38E7gi1k5mlgI8CnQE1gNLzGy6u6+KKvYVcBUwJM++dYC7gFQifS6WBvt+H+60RCqe/PokfL3uMybccysrVqxg0KBBPPTQQxx00EEFHkuPjgTCJYj27n5K1PIMM1vi7qeY2UcF7NeOSF+KdQBmNhlIA/YmCHf/Iti2J8++nYE33H1LsP0NoAu/JCkRySNvxbK7s+nd6dzywHjq1KrJK6+8Qvfu3UMfT4+OJEyCqGFmTd39KwAzawrUCLb9XMB+jYhMV5prPXBqyLhi7furf6lmNgAYANC0adOQhxYpn6IrkHO2ZbH5tbFkr11CteYnsXLRTA477LAkRidlUZgEcTuw0MzWEhl2ozlwvZkdBDyTyOAK4+7jgHEAqampmjNbKrTciuXta95l82t/Z8/ObdQ+bwCtzr1EyUGKJcyMcrPMrCXQKli12t13BO/HFLBrJtAkarlxsC6MTODsPPvOD7mvSIV0ffvDuH7wzWz94E0q12tGg0vv4ZCGRzKs67HJDk3KqLDNXE8GmgXlW5sZ7v5sIfssAVqaWXMiX/iXAr8P+XlzgPvMrHaw3Am4M+S+IhXO7NmzueOaa/hp40Yan3s5KSf9jkaH1lTFsuyXQhOEmU0EjgRWALk1YA4UmCDcfbeZ3Ujkyz4FmODuH5nZKCDD3aeb2SnAy0Bt4CIz+4u7/8bdt5jZ3USSDMCo3AprEfnF1q1bGTJkCP/85z857rjjmDZtGqmpqYXuF2acJRFzL/jRvZl9DBznhRVMstTUVM/IyEh2GCIJk/dLvXPt7xh/31DWr1/P0KFDGTlyJNWqFdg1ae9xopvDQqSPw/09TlCSqIDMbKm7x/yrIswjpg+Bw4D/xTUqEQkt+kt9z8/ZrJzyOIuWzaThES1YuHAhp59+euhjaYhuCStMgqgLrDKz94CduSvdPXyDahHZL7lf6ju+/pDNs8awO+sbDk5No8lFA4qUHEBDdEt4YRLEyEQHISIFW//Nd2x56xl+Wj6LA2odRoPf30+1JsfzzfaiP/ktyjhLUrGFaeb6VkkEIiKxvfLKK2yccAM//7iFg1PTqPXby6lUJfJlXpwv9aGdj4lZB6FxliSvfBOEmS109zPM7EcirZb2bgLc3WsmPDqRCmzjxo0MHjyYqVOn0qzlsXjPEVDvqL3bi/ulrnGWJKx8E4S7nxH8PLjkwhGR3Ml8hgwZQnZ2Nvfddx9Dhgxh5oeb4valrnGWJIxQHeWCkVkbRJfPHZtJROLns88+Y+DAgcybN4+zzjqLcePGcfTRRwP6UpeSF6aj3GAiQ29/A+SOuurAiQmMS6RC2bVrFw8//DAjR46katWqjBs3jv79+1OpUqFTtogkTJg7iJuBY9x9c6KDEamI7n3qRe4dMYTsb76gzvG/5cG/jaFfp5OSHZZIqATxNfBDogMRKY8KGtJi48aNXNr/Bt6a9RIpNetTr8cIDmx5Gg++vYk69TL1OEmSLuyMcvPNbCb7dpR7OGFRiZQD+c3wlrN7NxsWT2fEiBH8uG07NU/vxSGn96JS5cgwGerVLKVFmATxVfCqErxEJIRYQ1pkffERfdMGs+1/a+jYsSOrmvXkgDq/TgTq1SylQZiOcn8piUBEypt9Znjb/gNZbz3DTx+8TkqNQ/nPf/5Dz549OWP0PPVqllIrTCumo4Eh/DIfBADufm7iwhIp+xrWqs76LT/x0wevk/XWs+z5eTs12/Xg2Auu5pJLLgTUq1lKtzCPmKYATwBP8ct8ECIVWpj5FLrU+Y57HxnOzk2fU7XJ8dTpeB2HNGzB8O4n7C2jXs1SmoVJELvd/fGERyJSRuRX+QyRL/w1a9YwdOhQpk2bRr3DG1P7939mZ+NTaFT7wJhf/uoAJ6VVmAQxw8yuJzLzW3QrJs3wJhVSfvMp3P/KUhZNGsuYMWOoUqUK9957L7feeivVq6s+QcqmMAmib/BzaNQ6B1rEPxyR0i9vCyPfk8NPH7zB129PZEn2Vq666iruvfdeDj/88CRFKBIfYVoxNS+JQETKiuj5FHZ8+QFb5o5j17dfcHCz45k39V+cfPLJSY5QJD7CtGK6MtZ6d382/uGIJF9hFdBDOx/DrU9M55u5T5O9dgkpNevTsMcf+Mcfr+PkkxonMXKR+ArziOmUqPfVgPOAZYAShJQ7hVVAf/nll7w85s98OXEilaoeSK2z+tLqvEu548ITVNEs5U6YR0yDo5fNrBYwOWERiSRRfhXQ9730Hm89u4DHHnsMM2PIkCEMHz6cOnXqJClSkcQLNR9EHtsA1UtIuZS3AnrPz9lsXTKNr957iaW7d3L11VczcuRIGjfWoyQp/8LUQczglylHKwHHEek8J1KmhOncllsB7Tm7+en9OWQtmsSebVnUOa4DC6f+k2OPPTZJ0YuUvDB3EA9Fvd8NfOnu6xMUj0hCFFa3kOu2847kxrv/H9+9/Ty7szZStcnxHHbJnxlzUy+OPVZ1DFKxhKmDeCt62cwqmdll7v5cYfuaWRdgLJACPOXuD+TZXpVIZffJwGagt7t/YWbNgI+B1UHRxe4+qPDTkYqqsLuD/OoWcofV3r17N8899xx33303G9eu5aCGR1Hn/Ls48qTfMqxLK1VAS4WUb4Iws5rADUAjYDrwRrA8BHgfKDBBBPNYPwp0BNYDS8xsuruviirWH/je3Y8ys0uB0UDvYNtad29TrLOSCiXM3UF+w2dnbvmJZ599lrvvvps1a9bQtm1bXnnlFS666CLMrGROQKSUKugOYiLwPfAOcA3wB8CAi919RYhjtwPWuPs6ADObDKQB0QkiDRgZvJ8K/MP0v1KihKk3KOzuAPbt3AaR3s/bVs1n2+L/0HdzJm3atGHatGl0795diUEkUFCCaOHuJwCY2VPA/4Cm7r4j5LEbEZmuNNd64NT8yrj7bjP7ATg02NbczJYDW4ER7v523g8wswHAAICmTZuGDEvKirD1BvndHUSvzx1We0WHHEUAABDYSURBVPvOn9m26i1+WDSZ3d9voNnRx/HIU/8gLS1NiUEkj0oFbNuV+8bdc4D1RUgO+ys3GbUFbgOeDx557cPdx7l7qrun1qtXr4RCk5JS0J1BtPwm14le37lVHc7KWc434wexeebDVKlWneF/e4q1H6/k4osvVnIQiaGgBNHazLYGrx+BE3Pfm9nWEMfOBJpELTcO1sUsY2YHAIcAm919p7tvBnD3pcBa4OhwpyTlRZg7A4jcHVSvnLLPutxJd7Kysrjvvvto1qwZT97/B9q0bMrLL7/Mj+s/5f7b+lOpUkH/BUQqtnwfMbl7Sn7bQloCtDSz5kQSwaXA7/OUmU5ktNh3gJ5Auru7mdUDtrh7jpm1AFoC6/YzHilj8tYbRK+PFmvSnX5ta/Lf58dwxRNP8NNPP9G1a1fuuOMOzjzzTN0tiIRUnJ7UoQR1CjcCc4g0c53g7h+Z2Sggw92nA+OBiWa2BthCJIkAnAmMMrNdwB5gkOafqHiKMh1n7qQ7q1ev5sEHH+S6Pz9LTk4OvXv35o477qB169YlGbpIuWDuXnipMiA1NdUzMjKSHYYQruVR2HJhyrg7CxYsYOzYsUybNo2qVavSr18/br/9dlq00LQlIgUxs6XunhpzmxKExFPelkcQ+av//h77jnYatlxBduzYwaRJkxg7dizvv/8+derUYdCgQdx8883Ur18/ficlUo4VlCBUQydxFbblUdhysWzYsIERI0bQpEkT+vXrx+7duxk3bhxff/019957r5KDSJwkrA5CKqawLY/Clov27rvvMnbsWKZMmUJOTg4XXXQRN998M+ecc44qnkUSQAlCiqSwOoGwLY/ClsvOzmbKlCk8+uijvPfee9SsWZMbb7yRG2+8kSOPPDJOZyUisegRk4SWW2+QmZWN80vP5mnLf+neUlCfhGiFlVu5ciWDBw+mYcOG9O3bl6ysLP7+97+zfv16HnnkESUHkRKgOwgJLcyYR7H6JMRqeRSr3OAzm/D9itc5/fpxLF68mCpVqtCzZ0+uvfZazjrrLD1GEilhShASWth6g9w+CYXJLbdixQrGjRvHwK7PsXXrVlq1asXDDz/MFVdcQd26deMSu4gUnRKEhBa23iCMTZs2MXnyZCZOnEhGRgZVq1alV69eDBgwgA4dOuhuQaQUUIKoAOLVIa0oPZtjyc7OZsaMGUycOJHXXnuNnJwc2rRpw5gxY7jiiiuoU6fO/p+siMSNEkQ5F2bI7LDDaoetX4i2Z88e3n77bSZOnMiUKVPYunUrjRo14vbbb+eKK67g+OOPj/9Ji0hcKEGUc2EqlsOUyRWmfsHd+eijj5g0aRLPPfccX375JTVq1OB3v/sdV1xxBWeffTYpKfs7FqSIJJoSRDkXpmK5OJ3W8nJ33n//faZOncrUqVNZvXo1lSpVolOnTtx3332kpaVx0EEHFS14EUkqJYhyLkzFcnErn92dpUuX7k0Ka9eupVKlSpxzzjnccsst/N///R8NGjTY/5MQkaRQgijnwlQsF6XyOScnh/fee48XX3yRqVOn8uWXX5KSksJ5553H8OHDSUtLQ7P7iZQPShDlXJiK5cLKZGVl8frrr/Pqq6/y2muv8d1331G5cmU6duzIXXfdRffu3Tn00EN//eEiUqZpuO9SKl5NU4vD3fnkk0949dVXmTlzJgsXLiQnJ4c6derQtWtXunXrRteuXalVq9Z+f5aIJFdBw33rDqIUimfT1LC2bt3KggULmDNnDjNnzuTzzz8H4MQTT2TYsGF069aN0047Ta2PRCoQJYhSKN5NU2PJzs5m0aJFpKenk56ezpIlS8jJyaF69eqcd955DBs2jAsuuICmTZvG78REpExRgihhYR4LJaJp6q5du8jIyCA9PZ25c+eyaNEidu7cSUpKCu3atePOO+/k3HPP5fTTT6datWrFPDsRKU+UIEKI17P+sI+F4tE0dfPmzSxevJh33nmHRYsW8d5777Ft2zYAWrduzQ033MC5557LmWeeycEHH1zkcxGR8k8JohBFedZfWCIJ+1ioqE1T3few67uv8Y2rqXbAN7RqdT2rV0em7kxJSaF169ZcddVVnHXWWZxzzjkaIVVEQlGCKETYL/UwiaQow2XnfnbeZLN7924+/fRTtq1azjFfvs38Re+xdf2n+M7I3cHPderQvn17rrzyStq3b88pp5yiHswiUiwVPkEU9ld/2C/1MImkKD2WL27biE7H1ObDDz9k+fLlzH7yP9y/fDkrV64kOztyjKpVq3L88cdz0rm/p3379rRv356WLVtqqGwRiYsKnSDC/NUf9ks9TCKJ9eio2gGVuDa1Dm+99RaffPIJn3zyCatXr+aTTz7hiy++ILefSq1atWjTpg2DBg2iTZs2tG3bllatWlG5cuX9+A2IiOSvQieIMH/1hx2GoqBEsmvXLjIzM6m99Us6VVnNy4tXsGXj19gPG/CsDVx97w97yx944IEcc8wxnHbaafTt25cTTzyRtm3bcsQRR+jOQERKVEIThJl1AcYCKcBT7v5Anu1VgWeBk4HNQG93/yLYdifQH8gBbnL3OfGOL8xf/QXVB7g7P/74I9988w0XNtjKo0uXkr11C7t//I6cHzax58dv2bbre6r9cSN79uzZ5zMaNmxIq1ataNXqnOBnK4455hgaN25MpUqV4n2qIiJFlrAEYWYpwKNAR2A9sMTMprv7qqhi/YHv3f0oM7sUGA30NrPjgEuB3wANgTfN7Gh33/fP/f3UsFZ11n+/jT3bfmDPz9vZ83M2/vN2alfO4fnnf2Dr1q38+OOP/Pjjj5yydStZWVls+mAT90zdxE2bNrFp0yZ27twZ4+QrUbVWfY5q0YyTf3MyRxxxxD6vJk2aqK+BiJR6ibyDaAescfd1AGY2GUgDohNEGjAyeD8V+IdFnqOkAZPdfSfwuZmtCY73TjwDHNr5GIY+u4A1j16xz/pvgMue/WXZzKhRowa1atWifv36NGjQgBNOOIH69evHfB122GEakkJEyrxEJohGwNdRy+uBU/Mr4+67zewH4NBg/eI8+/6qZ5qZDQAGAMUaEuLito3Ytes0hn9+M1tzDqBu7Vr8/oyj6XpSCw4++GBq1qzJwQcfzIEHHqjHPiJS4ZTpSmp3HweMg8horsU5xiXtmnPJtDFxjUtEpDxI5J/FmUCTqOXGwbqYZczsAOAQIpXVYfYVEZEESmSCWAK0NLPmZlaFSKXz9DxlpgN9g/c9gXSPNPyfDlxqZlXNrDnQEngvgbGKiEgeCXvEFNQp3AjMIdLMdYK7f2Rmo4AMd58OjAcmBpXQW4gkEYJy/yFSob0buCHeLZhERKRgmlFORKQCK2hGOTXNERGRmJQgREQkJiUIERGJSQlCRERiUoIQEZGYlCBERCQmJQgREYlJCUJERGJSghARkZiUIEREJKZyM9SGmX0LfBljU13guxIOpzSpyOevc6+4KvL5F/Xcj3D3erE2lJsEkR8zy8hvnJGKoCKfv869Yp47VOzzj+e56xGTiIjEpAQhIiIxVYQEMS7ZASRZRT5/nXvFVZHPP27nXu7rIEREpHgqwh2EiIgUgxKEiIjEVK4ThJl1MbPVZrbGzIYnO554M7MmZjbPzFaZ2UdmdnOwvo6ZvWFmnwU/awfrzcz+X/D7+MDMTkruGew/M0sxs+Vm9mqw3NzM3g3O8QUzqxKsrxosrwm2N0tm3PFgZrXMbKqZfWJmH5vZ6RXl2pvZrcG/+Q/NbJKZVSvP197MJpjZJjP7MGpdka+1mfUNyn9mZn0L+9xymyDMLAV4FOgKHAf0MbPjkhtV3O0Gbnf344DTgBuCcxwOzHX3lsDcYBkiv4uWwWsA8HjJhxx3NwMfRy2PBh5x96OA74H+wfr+wPfB+keCcmXdWGC2u7cCWhP5PZT7a29mjYCbgFR3Px5IAS6lfF/7fwFd8qwr0rU2szrAXcCpQDvgrtykki93L5cv4HRgTtTyncCdyY4rwef8CtARWA0cHqw7HFgdvH8S6BNVfm+5svgCGgf/Mc4FXgWMSA/SA/L+GwDmAKcH7w8Iylmyz2E/zv0Q4PO851ARrj3QCPgaqBNcy1eBzuX92gPNgA+Le62BPsCTUev3KRfrVW7vIPjlH1Gu9cG6cim4bW4LvAs0cPf/BZs2Ag2C9+XtdzIGGAbsCZYPBbLcfXewHH1+e8892P5DUL6sag58CzwdPGJ7yswOogJce3fPBB4CvgL+R+RaLqXiXPtcRb3WRf43UJ4TRIVhZjWAF4Fb3H1r9DaP/KlQ7toym9mFwCZ3X5rsWJLkAOAk4HF3bwts45dHDEC5vva1gTQiSbIhcBC/fvxSoSTqWpfnBJEJNIlabhysK1fMrDKR5PCcu78UrP7GzA4Pth8ObArWl6ffSQegu5l9AUwm8phpLFDLzA4IykSf395zD7YfAmwuyYDjbD2w3t3fDZanEkkYFeHanw987u7fuvsu4CUi/x4qyrXPVdRrXeR/A+U5QSwBWgYtG6oQqcSanuSY4srMDBgPfOzuD0dtmg7ktlDoS6RuInf9lUErh9OAH6JuUcsUd7/T3Ru7ezMi1zbd3S8D5gE9g2J5zz33d9IzKF9m/7p2943A12Z2TLDqPGAVFeDaE3m0dJqZHRj8H8g99wpx7aMU9VrPATqZWe3gLqxTsC5/ya54SXClzgXAp8Ba4I/JjicB53cGkdvKD4AVwesCIs9X5wKfAW8CdYLyRqRl11pgJZFWIEk/jzj8Hs4GXg3etwDeA9YAU4CqwfpqwfKaYHuLZMcdh/NuA2QE138aULuiXHvgL8AnwIfARKBqeb72wCQi9S27iNw99i/OtQb6Bb+HNcDVhX2uhtoQEZGYyvMjJhER2Q9KECIiEpMShIiIxKQEISIiMSlBiIhITEoQUiGZWY6ZrTCz981smZm1L+Zxzs4dSbaQcu3MbIFFRhfOHRrjwOJ8ZgGfcZWZNYznMaViO6DwIiLlUra7twEws87A/cBZifggM2tApB3+pe7+TrCuJ3AwsD2OH3UVkX4BG+J4TKnAdAchAjWJDA+dO5b+g8E8AyvNrHdB66OZ2SnB3cGReTbdADyTmxwA3H2qu38TjOk/LRi3f7GZnRgca6SZDYk69odm1ix4fWxm/7TIfAivm1n1IOGkAs8Fd0bV4/5bkgpHdxBSUVU3sxVEetkeTmQsJ4AeRHootwbqAkvMbAHQPp/1AASPqP4OpLn7V3k+63jgmXzi+Auw3N0vNrNzgWeDzylISyLDNF9rZv8Bfufu/zazG4Eh7p4R4vxFCqUEIRVV9COm04Fnzex4IsOXTHL3HCKDob0FnFLA+q3AscA4oJO7F/XxzhnA7wDcPd3MDjWzmoXs87m7rwjeLyUyT4BI3OkRk1R4waOfukC9Yh7if8AOIvNxxPIRcHIRj7mbff9/Vot6vzPqfQ76Q08SRAlCKjwza0Vk2srNwNtAb4vMdV0POJPIAG/5rQfIAroB95vZ2TE+4h9AXzM7NeozewSV128DlwXrzga+88icHl8QGb4bi8wp3DzEqfxIpOJbJC70l4dUVLl1EBAZ/bKvu+eY2ctEpqt8n8hIucPcfWMB61sBBBXOFwKvmVk//2WehtxtlwIPmVl9IjPgLQBmAyOBCWb2AZEWTbnDN79IZMjmj4jMEvhpiHP6F/CEmWUTmWIzu3i/GpEIjeYqIiIx6RGTiIjEpAQhIiIxKUGIiEhMShAiIhKTEoSIiMSkBCEiIjEpQYiISEz/H8V9IShjIj6OAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "def y(x): return (0.27 / (1000 ** 2)) * (x ** 2)\n",
    "plt.scatter(book_counts, run_times)\n",
    "plt.plot(book_counts, y(np.array(book_counts)), c='k')\n",
    "plt.xlabel('Book Count')\n",
    "plt.ylabel('Running Time (Seconds)')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our plotted equation overlaps with the measured times. Thusly, we can use the equation the to predict the speed of larger book comparisons. Lets see how long it will take to measure the similarity across 300,000 books.\n",
    "\n",
    "**Listing 13. 51. Predicting the running-time for 300K books**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "It will take 6.75 hours to compute all-by-all similarities from a 300000-book by 50000-word matrix\n"
     ]
    }
   ],
   "source": [
    "book_count = 300000\n",
    "run_time = y(book_count) / 3600\n",
    "print(f\"It will take {run_time} hours to compute all-by-all similarities \"\n",
    "      f\"from a {book_count}-book by {vocabulary_size}-word matrix\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It will take nearly 7 hours to compare 300,000 books. This delay in time is not acceptable, especially in industrial NLP systems, which are designed to process millions of texts in mere seconds. We need to somehow reduce the running-time. One approach is to reduce the matrix-size."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
